Leptonica 1.85.0
Image processing and image analysis suite
Loading...
Searching...
No Matches
renderpdf.c File Reference
#include "allheaders.h"

Go to the source code of this file.

Functions

l_ok l_pdfRenderFile (const char *filename, l_int32 res, SARRAY **psaout)
 
l_ok l_pdfRenderFiles (const char *dir, SARRAY *sain, l_int32 res, SARRAY **psaout)
 

Detailed Description


  Rendering pdf files using an external library
       l_int32     l_pdfRenderFile()
       l_int32     l_pdfRenderFiles()

  Utility for rendering a set of pdf files as page images.
  The images are rendered for full page images at a specified
  resolution between 50 and 300 ppi, in the directory
      /tmp/lept/renderpdf/

  An application like cleanpdf performs a sequence of:
  (1) rendering the pdfs into a set of images,
  (2) doing image processing on each image to generate new images, and
  (3) wrapping the new images up in a single pdf file.
  Typically, the processed images made by step (2) are stored compressed
  in memory in a PixaComp, before wrapping them up in step (3).

  This requires the Poppler package of pdf utilities, in particular
  the program pdftoppm.  For non-unix systems, this requires
  installation of the cygwin Poppler package:
     https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86/poppler/
           poppler-0.26.5-1

  For the rasterizer, use pdftoppm:
     pdftoppm -r res fname outroot  ('-r res' renders output at res ppi)
  This works on all pdf pages, both wrapped images and pages that
  were made orthographically.  The default output resolution for
  pdftoppm is 150 ppi, but we typically use 300 ppi.  This makes large
  uncompressed RGB image files (e.g., a standard size RGB page image
  at 300 ppi is 25 MB), but it is very fast.

  The size of the resulting images does not depend on the resolution
  of the images stored in the input pdf.  We compute the value of the
  resolution parameter (render_res) that when input to pdftoppm
  will generate a page-size image (612 x 792 pts) at the requested
  output resolution.

  We do NOT use pdfimages:
     pdfimages -j fname outroot   (-j outputs jpeg if input is dct)
  pdfimages only works when all pages are pdf wrappers around images.
  Further, in some cases, it scrambles the order of the output pages
  and inserts extra images.

  By default, this function will not run, because it makes a call
  to system(1).  To render pdfs as a set of images in a directory,
  three things are required:
  (1) To have poppler installed.
  (2) To enable debug operations using setLeptDebugOK(1).
  (3) To link the functions that generate pdf files in the library
      (in pdfio1.c, pdfio2.c).

Definition in file renderpdf.c.

Function Documentation

◆ l_pdfRenderFile()

l_ok l_pdfRenderFile ( const char * filename,
l_int32 res,
SARRAY ** psaout )

l_pdfRenderFile()

Parameters
[in]filenameinput pdf file
[in]resoutput resolution (0, [50 ... 300]) ppi
[out]psaoutsarray of filenames of rasterized images
Returns
0 if OK, 1 on error
Notes:
     (1) Wrapper to l_padfRenderFiles() for a single input pdf file.

Definition at line 110 of file renderpdf.c.

References L_COPY, and l_pdfRenderFiles().

◆ l_pdfRenderFiles()

l_ok l_pdfRenderFiles ( const char * dir,
SARRAY * sain,
l_int32 res,
SARRAY ** psaout )

l_pdfRenderFiles()

Parameters
[in]dirdirectory of input pdf files
[in]sainsarray of input pdf filenames
[in]resoutput resolution (0, [50 ... 300]) ppi
[out]psaoutsarray of output filenames of rendered images
Returns
0 if OK, 1 on error
Notes:
     (1) Because this uses the "system" call, it is disabled by default
         on all platforms.  It is not supported and therefor3 disabled
         on iOS 11.
     (2) Input pdf file(s) are specified either by an input directory
         or an sarray with the paths.  Use the sarray if it is given;
         otherwise, use all files in the directory with extention "pdf",
         and name the rendered images in lexical order of the filenames.
     (3) The allowed output rendering resolutions are between 50 ppi
         and 300 ppi.  Typical resolutions are 150 and 300 ppi.
         Default input value of 0 can be used for 300 ppi resolution.
     (4) Images are rendered in ppm format in directory /tmp/lept/renderpdf
         and named in lexical order of the input filenames.  On invocation,
         any existing files in this directory are removed.
     (5) This requires pdftoppm from the Poppler package of pdf utilities.

Definition at line 159 of file renderpdf.c.

References L_NOCOPY.

Referenced by l_pdfRenderFile().