aboutsummaryrefslogtreecommitdiff
path: root/python/python2-pdfminer/README
diff options
context:
space:
mode:
authorDave Woodfall <dave@slackbuilds.org>2021-05-03 06:50:17 +0100
committerRobby Workman <rworkman@slackbuilds.org>2021-05-03 01:49:58 -0500
commit919b3a5df1c54b0e212e787d184efbbc2982b238 (patch)
tree84c51c407a304cb1ab44c80f3e61011f9d17197c /python/python2-pdfminer/README
parentc226d01a8d8150d3b441cad73fffd66bf6aba5cf (diff)
python/python-pdfminer: Renamed python2-pdfminer.
Signed-off-by: Dave Woodfall <dave@slackbuilds.org>
Diffstat (limited to 'python/python2-pdfminer/README')
-rw-r--r--python/python2-pdfminer/README23
1 files changed, 23 insertions, 0 deletions
diff --git a/python/python2-pdfminer/README b/python/python2-pdfminer/README
new file mode 100644
index 0000000000000..64ca2affa2ffd
--- /dev/null
+++ b/python/python2-pdfminer/README
@@ -0,0 +1,23 @@
+PDFMiner is a tool for extracting information from PDF documents. Unlike
+other PDF-related tools, it focuses entirely on getting and analyzing
+text data. PDFMiner allows one to obtain the exact location of text in a
+page, as well as other information such as fonts or lines. It includes a
+PDF converter that can transform PDF files into other text formats (such
+as HTML). It has an extensible PDF parser that can be used for other
+purposes than text analysis.
+
+PDFMiner comes with two handy tools: pdf2txt.py and dumppdf.py.
+
+pdf2txt.py
+
+pdf2txt.py extracts text contents from a PDF file. It cannot recognize
+text drawn as images. It also extracts locations, font names/sizes,
+writing direction. It requires a password for password protected PDF
+documents. You cannot extract any text from a PDF document which does
+not have extraction permission.
+
+dumppdf.py
+
+dumppdf.py dumps the internal contents of a PDF file in pseudo-XML
+format. This program is primarily for debugging purposes, but it's also
+possible to extract some meaningful contents (e.g. images).