Leptonica 1.85.0
Image processing and image analysis suite
Loading...
Searching...
No Matches
pdfapp.c File Reference
#include <string.h>
#include "allheaders.h"

Go to the source code of this file.

Functions

l_ok compressFilesToPdf (SARRAY *sa, l_int32 onebit, l_int32 savecolor, l_float32 scalefactor, l_int32 quality, const char *title, const char *fileout)
 
l_ok cropFilesToPdf (SARRAY *sa, l_int32 lr_clear, l_int32 tb_clear, l_int32 edgeclean, l_int32 lr_border, l_int32 tb_border, l_float32 maxwiden, l_int32 printwiden, const char *title, const char *fileout)
 
l_ok cleanTo1bppFilesToPdf (SARRAY *sa, l_int32 res, l_int32 contrast, l_int32 rotation, l_int32 opensize, const char *title, const char *fileout)
 

Detailed Description


   Image processing operations on multiple images followed by wrapping
   them into a pdf.

   There are two possible ways to specify the set of images:
   (1) an array of pathnames
   (2) a directory, typically with an additional pattern for selection.
   We use (1) because it is both simpler and more general.

   Corresponding to each function here is:
   (1) the image processing function that is carried out on each image
   (2) a program in prog that extracts images from a pdf and calls this
       function with an array of their pathnames.

   |=============================================================|
   |                        Important notes                      |
   |=============================================================|
   | Some of these functions require I/O libraries such as       |
   | libtiff, libjpeg, libpng and libz.  If you do not have      |
   | these libraries, some calls will fail.  For example,        |
   | if you do not have libtiff, you cannot write a pdf that     |
   | uses libtiff to encode bilevel images in tiffg4.            |
   |                                                             |
   | You can manually deactivate all pdf writing by setting      |
   | this in environ.h:                                          |
   | 
|
| #define USE_PDFIO 0 |
|
| | This will link the stub file pdfappstub.c. | |=============================================================| The images in the pdf file can be rendered using a pdf viewer, such as evince, gv, xpdf or acroread. Compression of images for prog/compresspdf l_int32 compressFilesToPdf() Crop images for prog/croppdf l_int32 cropFilesToPdf() Cleanup and binarization of images for prog/cleanpdf l_int32 cleanTo1bppFilesToPdf()

Definition in file pdfapp.c.

Function Documentation

◆ cleanTo1bppFilesToPdf()

l_ok cleanTo1bppFilesToPdf ( SARRAY * sa,
l_int32 res,
l_int32 contrast,
l_int32 rotation,
l_int32 opensize,
const char * title,
const char * fileout )

cleanTo1bppFilesToPdf()

Parameters
[in]sasorted full pathnames of images
[in]reseither 300 or 600 ppi for output
[in]contrastvary contrast: 1 = lightest; 10 = darkest; suggest 1 unless light features are being lost
[in]rotationcw by 90 degrees: {0,1,2,3} represent 0, 90, 180 and 270 degree cw rotations
[in]opensizeopening size of structuring element for noise removal: {0 or 1to skip; 2, 3 for opening}
[in]title[optional] pdf title; can be null
[in]fileoutpdf file of all images
Returns
0 if OK, 1 on error
Notes:
   (1) This deskews, optionally rotates and darkens, cleans background
       to white, binarizes and optionally removes small noise, and
       put the images into the pdf in the order given in sa.
   (2) All images in the pdf are tiffg4 encoded.
   (3) For color and grayscale input, local background normalization is
       done to 200, and a threshold of 180 sets the maximum foreground
       value in the normalized image.
   (4) The res parameter can be either 300 or 600 ppi.  If the input
       is gray or color and res = 600, this does an interpolated 2x
       expansion before binarizing.
   (5) The contrast parameter adjusts the binarization to avoid losing
       lighter input pixels.  Contrast is increased as contrast increases
       from 1 to 10.
   (6) The #opensize parameter is the size of a square SEL used with
       opening to remove small speckle noise.  Allowed open sizes are 2,3.
       If this is to be used, try 2 before 3.
   (7) If there are more than 200 images, store the images after processing
       as an array of compressed images (a Pixac); otherwise, use a Pixa.

Definition at line 384 of file pdfapp.c.

References L_CLONE, L_G4_ENCODE, L_INSERT, and L_NOCOPY.

◆ compressFilesToPdf()

l_ok compressFilesToPdf ( SARRAY * sa,
l_int32 onebit,
l_int32 savecolor,
l_float32 scalefactor,
l_int32 quality,
const char * title,
const char * fileout )

compressFilesToPdf()

Parameters
[in]sasorted full pathnames of images
[in]onebitset to 1 to enforce 1 bpp tiffg4 encoding
[in]savecolorif onebit == 1, set to 1 to save color
[in]scalefactorscaling factor applied to each image; > 0.0
[in]qualityfor jpeg: 0 for default (50; otherwise 25 - 95.
[in]title[optional] pdf title; can be null
[in]fileoutpdf file of all images
Returns
0 if OK, 1 on error
Notes:
   (1) This function is designed to optionally scale and compress a set of
       images, wrapping them in a pdf in the order given in the input sa.
   (2) It does the image processing for prog/compresspdf.c.
   (3) Images in the output pdf are encoded with either tiffg4 or jpeg (DCT),
       or a mixture of them depending on parameters onebit and savecolor.
   (4) Parameters onebit and savecolor work as follows:
       onebit = 0: no depth conversion, default encoding depends on depth
       onebit = 1, savecolor = 0: all images converted to 1 bpp
       onebit = 1, savecolor = 1: images without color are converted
          to 1 bpp; images with color have the color preserved.
   (5) In use, if most of the pages are 1 bpp but some have color that needs
       to be preserved, onebit and savecolor should both be 1.  This
       causes DCT compression of color images and tiffg4 compression
       of monochrome images.
   (6) The images will be concatenated in the order given in sa.
   (7) Typically, scalefactor <= 1.0.  It is applied to each image
       before encoding.  If you enter a value <= 0.0, it will be set to 1.0.
       The maximum allowed value is 2.0.
   (8) Default jpeg quality is 50; otherwise, quality factors between
       25 and 95 are enforced.
   (9) Page images at 300 ppi are about 8 Mpixels.  RGB(A) rasters are
       then about 32 MB (1 bpp images are about 1 MB).  If there are
       more than 25 images, store the images after processing as an
       array of compressed images (a Pixac); otherwise, use a Pixa.

Definition at line 131 of file pdfapp.c.

References L_CLONE, L_DEFAULT_ENCODE, L_INSERT, and L_NOCOPY.

◆ cropFilesToPdf()

l_ok cropFilesToPdf ( SARRAY * sa,
l_int32 lr_clear,
l_int32 tb_clear,
l_int32 edgeclean,
l_int32 lr_border,
l_int32 tb_border,
l_float32 maxwiden,
l_int32 printwiden,
const char * title,
const char * fileout )

cropFilesToPdf()

Parameters
[in]sasorted full pathnames of images
[in]lr_clearfull res pixels cleared at left and right sides
[in]tb_clearfull res pixels cleared at top and bottom sides
[in]edgecleanparameter for removing edge noise (-1 to 15) default = 0 (no removal); 15 is maximally aggressive for random noise -1 for aggressively removing side noise -2 to extract page embedded in black background
[in]lr_borderfull res final "added" pixels on left and right
[in]tb_borderfull res final "added" pixels on top and bottom
[in]maxwidenmax fractional horizontal stretch allowed
[in]printwiden0 to skip, 1 for 8.5x11, 2 for A4
[in]title[optional] pdf title; can be null
[in]fileoutpdf file of all images
Returns
0 if OK, 1 on error
Notes:
   (1) This function is designed to optionally remove white space from
       around the page images, and generate a pdf that prints with
       foreground occupying much of the full page.
   (2) It does the image processing for prog/croppdf.c.
   (3) Images in the output pdf are 1 bpp and encoded with tiffg4.
   (4) See documentation in pixCropImage() for details on the processing.
   (5) The images will be concatenated in the order given in safiles.
   (6) Output page images are at 300 ppi and are stored in memory.
       They are about 1 Mpixel when uncompressed.  For up to 200 pages,
       the images are stored uncompressed; otherwise, the stored
       images are compressed with tiffg4.

Definition at line 270 of file pdfapp.c.

References L_CLONE, L_G4_ENCODE, L_INSERT, and L_NOCOPY.