diff options
author | Alan Aversa <alan.aveNOrsaSP@AMcox.net (remove NO and SPAM)> | 2020-12-11 19:49:34 +0000 |
---|---|---|
committer | Willy Sudiarto Raharjo <willysr@slackbuilds.org> | 2020-12-12 07:09:21 +0700 |
commit | 9db551e4a505b719249014f0acb1cc79edffcd9d (patch) | |
tree | 8e61a95a36028a9a5a27b1e215260ff8ad2a6f5c /graphics/img2pdf | |
parent | aa12b989c12d4709ab6ff7771731f6fc76e99074 (diff) |
graphics/img2pdf: Added (conversion of raster images to PDF)
Signed-off-by: Dave Woodfall <dave@slackbuilds.org>
Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
Diffstat (limited to 'graphics/img2pdf')
-rw-r--r-- | graphics/img2pdf/README | 234 | ||||
-rw-r--r-- | graphics/img2pdf/img2pdf.SlackBuild | 88 | ||||
-rw-r--r-- | graphics/img2pdf/img2pdf.info | 10 | ||||
-rw-r--r-- | graphics/img2pdf/slack-desc | 19 |
4 files changed, 351 insertions, 0 deletions
diff --git a/graphics/img2pdf/README b/graphics/img2pdf/README new file mode 100644 index 0000000000000..7da803e3acbda --- /dev/null +++ b/graphics/img2pdf/README @@ -0,0 +1,234 @@ +img2pdf + +Lossless conversion of raster images to PDF. You should use img2pdf if +your priorities are (in this order): + + always lossless: the image embedded in the PDF will always have the +exact same color information for every pixel as the input small: if +possible, the difference in filesize between the input image and the +output PDF will only be the overhead of the PDF container itself fast: +if possible, the input image is just pasted into the PDF document as-is +without any CPU hungry re-encoding of the pixel data + +Conventional conversion software (like ImageMagick) would either: + + not be lossless because lossy re-encoding to JPEG not be small +because using wasteful flate encoding of raw pixel data not be fast +because input data gets re-encoded + +Another advantage of not having to re-encode the input (in most common +situations) is, that img2pdf is able to handle much larger input than +other software, because the raw pixel data never has to be loaded into +memory. + +The following table shows how img2pdf handles different input depending +on the input file format and image color space. Format +Colorspace Result JPEG any direct JPEG2000 any +direct PNG (non-interlaced) any direct TIFF (CCITT Group 4) +monochrome direct any any except CMYK and monochrome PNG +Paeth any monochrome CCITT Group 4 any CMYK flate + +For JPEG, JPEG2000, non-interlaced PNG and TIFF images with CCITT Group +4 encoded data, img2pdf directly embeds the image data into the PDF +without re-encoding it. It thus treats the PDF format merely as a +container format for the image data. In these cases, img2pdf only +increases the filesize by the size of the PDF container (typically +around 500 to 700 bytes). Since data is only copied and not re-encoded, +img2pdf is also typically faster than other solutions for these input +formats. + +For all other input types, img2pdf first has to transform the pixel data +to make it compatible with PDF. In most cases, the PNG Paeth filter is +applied to the pixel data. For monochrome input, CCITT Group 4 is used +instead. Only for CMYK input no filter is applied before finally +applying flate compression. Usage + +The images must be provided as files because img2pdf needs to seek in +the file descriptor. + +If no output file is specified with the -o/--output option, output will +be done to stdout. A typical invocation is: + +$ img2pdf img1.png img2.jpg -o out.pdf + +The detailed documentation can be accessed by running: + +$ img2pdf --help + +Bugs + + If you find a JPEG, JPEG2000, PNG or CCITT Group 4 encoded TIFF file +that, when embedded into the PDF cannot be read by the Adobe Acrobat +Reader, please contact me. + + I have not yet figured out how to determine the colorspace of +JPEG2000 files. Therefore JPEG2000 files use DeviceRGB by default. For +JPEG2000 files with other colorspaces, you must explicitly specify it +using the --colorspace option. + + Input images with alpha channels are not allowed. PDF only supports +transparency using binary masks but is unable to store 8-bit +transparency information as part of the image itself. But img2pdf will +always be lossless and thus, input images must not carry transparency +information. + + img2pdf uses PIL (or Pillow) to obtain image meta data and to +convert the input if necessary. To prevent decompression bomb denial of +service attacks, Pillow limits the maximum number of pixels an input +image is allowed to have. If you are sure that you know what you are +doing, then you can disable this safeguard by passing the +--pillow-limit-break option to img2pdf. This allows one to process even +very large input images. + +Installation + +On a Debian- and Ubuntu-based systems, img2pdf can be installed from the +official repositories: + +$ apt install img2pdf + +If you want to install it using pip, you can run: + +$ pip3 install img2pdf + +If you prefer to install from source code use: + +$ cd img2pdf/ $ pip3 install . + +To test the console script without installing the package on your +system, use virtualenv: + +$ cd img2pdf/ $ virtualenv ve $ ve/bin/pip3 install . + +You can then test the converter using: + +$ ve/bin/img2pdf -o test.pdf src/tests/test.jpg + +For Microsoft Windows users, PyInstaller based .exe files are produced +by appveyor. If you don't want to install Python before using img2pdf +you can head to appveyor and click on "Artifacts" to download the latest +version: https://ci.appveyor.com/project/josch/img2pdf GUI + +There exists an experimental GUI with all settings currently disabled. +You can directly convert images to PDF but you cannot set any options +via the GUI yet. If you are interested in adding more features to the +PDF, please submit a merge request. The GUI is based on tkinter and +works on Linux, Windows and MacOS. + +Library + +The package can also be used as a library: + +import img2pdf + +# opening from filename with open("name.pdf","wb") as f: +f.write(img2pdf.convert('test.jpg')) + +# opening from file handle with open("name.pdf","wb") as f1, +open("test.jpg") as f2: f1.write(img2pdf.convert(f2)) + +# using in-memory image data with open("name.pdf","wb") as f: +f.write(img2pdf.convert("\x89PNG...") + +# multiple inputs (variant 1) with open("name.pdf","wb") as f: +f.write(img2pdf.convert("test1.jpg", "test2.png")) + +# multiple inputs (variant 2) with open("name.pdf","wb") as f: +f.write(img2pdf.convert(["test1.jpg", "test2.png"])) + +# convert all files ending in .jpg inside a directory dirname = +"/path/to/images" with open("name.pdf","wb") as f: imgs = [] for fname +in os.listdir(dirname): if not fname.endswith(".jpg"): continue path = +os.path.join(dirname, fname) if os.path.isdir(path): continue +imgs.append(path) f.write(img2pdf.convert(imgs)) + +# convert all files ending in .jpg in a directory and its subdirectories +dirname = "/path/to/images" with open("name.pdf","wb") as f: imgs = [] +for r, _, f in os.walk(dirname): for fname in f: if not +fname.endswith(".jpg"): continue imgs.append(os.path.join(r, fname)) +f.write(img2pdf.convert(imgs)) + + +# convert all files matching a glob import glob with +open("name.pdf","wb") as f: +f.write(img2pdf.convert(glob.glob("/path/to/*.jpg"))) + +# writing to file descriptor with open("name.pdf","wb") as f1, +open("test.jpg") as f2: img2pdf.convert(f2, outputstream=f1) + +# specify paper size (A4) a4inpt = +(img2pdf.mm_to_pt(210),img2pdf.mm_to_pt(297)) layout_fun = +img2pdf.get_layout_fun(a4inpt) with open("name.pdf","wb") as f: +f.write(img2pdf.convert('test.jpg', layout_fun=layout_fun)) + +Comparison to ImageMagick + +Create a large test image: + +$ convert logo: -resize 8000x original.jpg + +Convert it into PDF using ImageMagick and img2pdf: + +$ time img2pdf original.jpg -o img2pdf.pdf $ time convert original.jpg +imagemagick.pdf + +Notice how ImageMagick took an order of magnitude longer to do the +conversion than img2pdf. It also used twice the memory. + +Now extract the image data from both PDF documents and compare it to the +original: + +$ pdfimages -all img2pdf.pdf tmp $ compare -metric AE original.jpg +tmp-000.jpg null: 0 $ pdfimages -all imagemagick.pdf tmp $ compare +-metric AE original.jpg tmp-000.jpg null: 118716 + +To get lossless output with ImageMagick we can use Zip compression but +that unnecessarily increases the size of the output: + +$ convert original.jpg -compress Zip imagemagick.pdf $ pdfimages -all +imagemagick.pdf tmp $ compare -metric AE original.jpg tmp-000.png null: +0 $ stat --format="%s %n" original.jpg img2pdf.pdf imagemagick.pdf +1535837 original.jpg 1536683 img2pdf.pdf 9397809 imagemagick.pdf + +Comparison to pdfLaTeX + +pdfLaTeX performs a lossless conversion from included images to PDF by +default. If the input is a JPEG, then it simply embeds the JPEG into the +PDF in the same way as img2pdf does it. But for other image formats it +uses flate compression of the plain pixel data and thus needlessly +increases the output file size: + +$ convert logo: -resize 8000x original.png $ cat << END > pdflatex.tex +\documentclass{article} \usepackage{graphicx} \begin{document} +\includegraphics{original.png} \end{document} END $ pdflatex +pdflatex.tex $ stat --format="%s %n" original.png pdflatex.pdf 4500182 +original.png 9318120 pdflatex.pdf + +Comparison to podofoimg2pdf + +Like pdfLaTeX, podofoimg2pdf is able to perform a lossless conversion +from JPEG to PDF by plainly embedding the JPEG data into the pdf +container. But just like pdfLaTeX it uses flate compression for all +other file formats, thus sometimes resulting in larger files than +necessary. + +$ convert logo: -resize 8000x original.png $ podofoimg2pdf out.pdf +original.png stat --format="%s %n" original.png out.pdf 4500181 +original.png 9335629 out.pdf + +It also only supports JPEG, PNG and TIF as input and lacks many of the +convenience features of img2pdf like page sizes, borders, rotation and +metadata. Comparison to Tesseract OCR + +Tesseract OCR comes closest to the functionality img2pdf provides. It is +able to convert JPEG and PNG input to PDF without needlessly increasing +the filesize and is at the same time lossless. So if your input is JPEG +and PNG images, then you should safely be able to use Tesseract instead +of img2pdf. For other input, Tesseract might not do a lossless +conversion. For example it converts CMYK input to RGB and removes the +alpha channel from images with transparency. For multipage TIFF or +animated GIF, it will only convert the first frame. + +OPTIONAL: + +python3 diff --git a/graphics/img2pdf/img2pdf.SlackBuild b/graphics/img2pdf/img2pdf.SlackBuild new file mode 100644 index 0000000000000..87a3ae33eb255 --- /dev/null +++ b/graphics/img2pdf/img2pdf.SlackBuild @@ -0,0 +1,88 @@ +#!/bin/sh + +# Slackware build script for img2pdf + +# Copyright 2020 Alan Aversa +# All rights reserved. +# +# Redistribution and use of this script, with or without modification, is +# permitted provided that the following conditions are met: +# +# 1. Redistributions of this script must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED +# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO +# EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; +# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR +# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF +# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +PRGNAM=img2pdf +VERSION=${VERSION:-0.4.0} +BUILD=${BUILD:-1} +TAG=${TAG:-_SBo} +if [ -z "$ARCH" ]; then + case "$( uname -m )" in + i?86) ARCH=i586 ;; + arm*) ARCH=arm ;; + *) ARCH=$( uname -m ) ;; + esac +fi + +CWD=$(pwd) +TMP=${TMP:-/tmp/SBo} +PKG=$TMP/package-$PRGNAM +OUTPUT=${OUTPUT:-/tmp} +if [ "$ARCH" = "i586" ]; then + SLKCFLAGS="-O2 -march=i586 -mtune=i686" + LIBDIRSUFFIX="" +elif [ "$ARCH" = "i686" ]; then + SLKCFLAGS="-O2 -march=i686 -mtune=i686" + LIBDIRSUFFIX="" +elif [ "$ARCH" = "x86_64" ]; then + SLKCFLAGS="-O2 -fPIC" + LIBDIRSUFFIX="64" +else + SLKCFLAGS="-O2" + LIBDIRSUFFIX="" +fi + +set -e + +rm -rf $PKG +mkdir -p $TMP $PKG $OUTPUT +cd $TMP +rm -rf $PRGNAM-$VERSION +tar xvf $CWD/$PRGNAM-$VERSION.tar.gz +cd $PRGNAM-$VERSION +chown -R root:root . +find -L . \ + \( -perm 777 -o -perm 775 -o -perm 750 -o -perm 711 -o -perm 555 \ + -o -perm 511 \) -exec chmod 755 {} \; -o \ + \( -perm 666 -o -perm 664 -o -perm 640 -o -perm 600 -o -perm 444 \ + -o -perm 440 -o -perm 400 \) -exec chmod 644 {} \; + +sed -i "s/self.qmake_bin = 'qmake'/self.qmake_bin = 'qmake-qt5'/" setup.py + +if $(python3 -c 'import sys' 2>/dev/null); then + python3 setup.py install --root=$PKG +else + python setup.py install --root=$PKG +fi + +find $PKG -print0 | xargs -0 file | grep -e "executable" -e "shared object" | grep ELF \ + | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true + +mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION +cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild + +mkdir -p $PKG/install +cat $CWD/slack-desc > $PKG/install/slack-desc + +cd $PKG +/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.${PKGTYPE:-tgz} diff --git a/graphics/img2pdf/img2pdf.info b/graphics/img2pdf/img2pdf.info new file mode 100644 index 0000000000000..757c4f4abb20a --- /dev/null +++ b/graphics/img2pdf/img2pdf.info @@ -0,0 +1,10 @@ +PRGNAM="img2pdf" +VERSION="0.4.0" +HOMEPAGE="https://gitlab.mister-muffin.de/josch/img2pdf" +DOWNLOAD="https://files.pythonhosted.org/packages/80/ed/5167992abaf268f5a5867e974d9d36a8fa4802800898ec711f4e1942b4f5/img2pdf-0.4.0.tar.gz" +MD5SUM="e4e3510dd301e50a5d03739bf9991a86" +DOWNLOAD_x86_64="" +MD5SUM_x86_64="" +REQUIRES="" +MAINTAINER="Alan Aversa" +EMAIL="alan.aveNOrsaSP@AMcox.net (remove NO and SPAM)" diff --git a/graphics/img2pdf/slack-desc b/graphics/img2pdf/slack-desc new file mode 100644 index 0000000000000..de4242d2bb427 --- /dev/null +++ b/graphics/img2pdf/slack-desc @@ -0,0 +1,19 @@ +# HOW TO EDIT THIS FILE: +# The "handy ruler" below makes it easier to edit a package description. +# Line up the first '|' above the ':' following the base package name, and +# the '|' on the right side marks the last column you can put a character in. +# You must make exactly 11 lines for the formatting to be correct. It's also +# customary to leave one space after the ':' except on otherwise blank lines. + + |-----handy-ruler------------------------------------------------------| +img2pdf: img2pdf (Lossless conversion of raster images to PDF.) +img2pdf: +img2pdf: A Python package to losslessly convert raster images to PDF. +img2pdf: +img2pdf: Created and currently maintained by josch +img2pdf: https://pypi.org/user/josch/ +img2pdf: +img2pdf: Homepage: https://gitlab.mister-muffin.de/josch/img2pdf +img2pdf: +img2pdf: +img2pdf: |