![]() |
Leptonica 1.85.0
Image processing and image analysis suite
|
#include "allheaders.h"Go to the source code of this file.
Functions | |
| l_ok | l_pdfRenderFile (const char *filename, l_int32 res, SARRAY **psaout) |
| l_ok | l_pdfRenderFiles (const char *dir, SARRAY *sain, l_int32 res, SARRAY **psaout) |
Rendering pdf files using an external library
l_int32 l_pdfRenderFile()
l_int32 l_pdfRenderFiles()
Utility for rendering a set of pdf files as page images.
The images are rendered for full page images at a specified
resolution between 50 and 300 ppi, in the directory
/tmp/lept/renderpdf/
An application like cleanpdf performs a sequence of:
(1) rendering the pdfs into a set of images,
(2) doing image processing on each image to generate new images, and
(3) wrapping the new images up in a single pdf file.
Typically, the processed images made by step (2) are stored compressed
in memory in a PixaComp, before wrapping them up in step (3).
This requires the Poppler package of pdf utilities, in particular
the program pdftoppm. For non-unix systems, this requires
installation of the cygwin Poppler package:
https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86/poppler/
poppler-0.26.5-1
For the rasterizer, use pdftoppm:
pdftoppm -r res fname outroot ('-r res' renders output at res ppi)
This works on all pdf pages, both wrapped images and pages that
were made orthographically. The default output resolution for
pdftoppm is 150 ppi, but we typically use 300 ppi. This makes large
uncompressed RGB image files (e.g., a standard size RGB page image
at 300 ppi is 25 MB), but it is very fast.
The size of the resulting images does not depend on the resolution
of the images stored in the input pdf. We compute the value of the
resolution parameter (render_res) that when input to pdftoppm
will generate a page-size image (612 x 792 pts) at the requested
output resolution.
We do NOT use pdfimages:
pdfimages -j fname outroot (-j outputs jpeg if input is dct)
pdfimages only works when all pages are pdf wrappers around images.
Further, in some cases, it scrambles the order of the output pages
and inserts extra images.
By default, this function will not run, because it makes a call
to system(1). To render pdfs as a set of images in a directory,
three things are required:
(1) To have poppler installed.
(2) To enable debug operations using setLeptDebugOK(1).
(3) To link the functions that generate pdf files in the library
(in pdfio1.c, pdfio2.c).
Definition in file renderpdf.c.
| l_ok l_pdfRenderFile | ( | const char * | filename, |
| l_int32 | res, | ||
| SARRAY ** | psaout ) |
| [in] | filename | input pdf file |
| [in] | res | output resolution (0, [50 ... 300]) ppi |
| [out] | psaout | sarray of filenames of rasterized images |
Notes:
(1) Wrapper to l_padfRenderFiles() for a single input pdf file.
Definition at line 110 of file renderpdf.c.
References L_COPY, and l_pdfRenderFiles().
| [in] | dir | directory of input pdf files |
| [in] | sain | sarray of input pdf filenames |
| [in] | res | output resolution (0, [50 ... 300]) ppi |
| [out] | psaout | sarray of output filenames of rendered images |
Notes:
(1) Because this uses the "system" call, it is disabled by default
on all platforms. It is not supported and therefor3 disabled
on iOS 11.
(2) Input pdf file(s) are specified either by an input directory
or an sarray with the paths. Use the sarray if it is given;
otherwise, use all files in the directory with extention "pdf",
and name the rendered images in lexical order of the filenames.
(3) The allowed output rendering resolutions are between 50 ppi
and 300 ppi. Typical resolutions are 150 and 300 ppi.
Default input value of 0 can be used for 300 ppi resolution.
(4) Images are rendered in ppm format in directory /tmp/lept/renderpdf
and named in lexical order of the input filenames. On invocation,
any existing files in this directory are removed.
(5) This requires pdftoppm from the Poppler package of pdf utilities.
Definition at line 159 of file renderpdf.c.
References L_NOCOPY.
Referenced by l_pdfRenderFile().