![]() |
Leptonica 1.85.0
Image processing and image analysis suite
|
#include "allheaders.h"Go to the source code of this file.
Data Structures | |
| struct | PartitionElement |
Macros | |
| #define | OUTPUT_HEAP_STATS 0 |
Typedefs | |
| typedef struct PartitionElement | PARTEL |
Functions | |
| static PARTEL * | partelCreate (BOX *box) |
| static void | partelDestroy (PARTEL **ppartel) |
| static l_int32 | partelSetSize (PARTEL *partel, l_int32 sortflag) |
| static BOXA * | boxaGenerateSubboxes (BOX *box, BOXA *boxa, l_int32 maxperim, l_float32 fract) |
| static BOX * | boxaSelectPivotBox (BOX *box, BOXA *boxa, l_int32 maxperim, l_float32 fract) |
| static l_int32 | boxCheckIfOverlapIsBig (BOX *box, BOXA *boxa, l_float32 maxoverlap) |
| BOXA * | boxaGetWhiteblocks (BOXA *boxas, BOX *box, l_int32 sortflag, l_int32 maxboxes, l_float32 maxoverlap, l_int32 maxperim, l_float32 fract, l_int32 maxpops) |
| BOXA * | boxaPruneSortedOnOverlap (BOXA *boxas, l_float32 maxoverlap) |
Variables | |
| static const l_int32 | DefaultMaxPops = 20000 |
Whitespace block extraction
BOXA *boxaGetWhiteblocks()
Helpers
static PARTEL *partelCreate()
static void partelDestroy()
static l_int32 partelSetSize()
static BOXA *boxaGenerateSubboxes()
static BOX *boxaSelectPivotBox()
static l_int32 boxaCheckIfOverlapIsSmall()
BOXA *boxaPruneSortedOnOverlap()
Definition in file partition.c.
| #define OUTPUT_HEAP_STATS 0 |
Definition at line 73 of file partition.c.
| typedef struct PartitionElement PARTEL |
Definition at line 57 of file partition.c.
|
static |
| [in] | box | region to be split into up to four overlapping subregions |
| [in] | boxa | boxes of rectangles intersecting the box |
| [in] | maxperim | maximum half-perimeter for which pivot is selected by proximity to box centroid |
| [in] | fract | fraction of box diagonal that is an acceptable distance from the box centroid to select the pivot |
Definition at line 402 of file partition.c.
References boxaSelectPivotBox(), and L_INSERT.
Referenced by boxaGetWhiteblocks().
| BOXA * boxaGetWhiteblocks | ( | BOXA * | boxas, |
| BOX * | box, | ||
| l_int32 | sortflag, | ||
| l_int32 | maxboxes, | ||
| l_float32 | maxoverlap, | ||
| l_int32 | maxperim, | ||
| l_float32 | fract, | ||
| l_int32 | maxpops ) |
| [in] | boxas | typ. a set of bounding boxes of fg components |
| [in] | box | initial region; typically including all boxes in boxas; if null, it computes the region to include all boxes in boxas |
| [in] | sortflag | L_SORT_BY_WIDTH, L_SORT_BY_HEIGHT, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_PERIMETER, L_SORT_BY_AREA |
| [in] | maxboxes | max number of output whitespace boxes; e.g., 100 |
| [in] | maxoverlap | maximum fractional overlap of a box by any of the larger boxes; e.g., 0.2 |
| [in] | maxperim | maximum half-perimeter, in pixels, for which pivot is selected by proximity to box centroid; e.g., 200 |
| [in] | fract | fraction of box diagonal that is an acceptable distance from the box centroid to select the pivot; e.g., 0.2 |
| [in] | maxpops | max number of pops from the heap; use 0 as default |
Notes:
(1) This uses the elegant Breuel algorithm, found in "Two
Geometric Algorithms for Layout Analysis", 2002,
url: "citeseer.ist.psu.edu/breuel02two.html".
It starts with the bounding boxes (b.b.) of the connected
components (c.c.) in a region, along with the rectangle
representing that region. It repeatedly divides the
rectangle into four maximal rectangles that exclude a
pivot rectangle, sorting them in a priority queue
according to one of the six sort flags. It returns a boxa
of the "largest" set that have no intersection with boxes
from the input boxas.
(2) If box == NULL, the initial region is the minimal region
that includes the origin and every box in boxas.
(3) maxboxes is the maximum number of whitespace boxes that will
be returned. The actual number will depend on the image
and the values chosen for maxoverlap and maxpops. In many
cases, the actual number will be 'maxboxes'.
(4) maxoverlap allows pruning of whitespace boxes depending on
the overlap. To avoid all pruning, use maxoverlap = 1.0.
To select only boxes that have no overlap with each other
(maximal pruning), choose maxoverlap = 0.0.
Otherwise, no box can have more than the 'maxoverlap' fraction
of its area overlapped by any larger (in the sense of the
sortflag) box.
(5) Choose maxperim (actually, maximum half-perimeter) to
represent a c.c. that is small enough so that you don't care
about the white space that could be inside of it. For all such
c.c., the pivot for 'quadfurcation' of a rectangle is selected
as having a reasonable proximity to the rectangle centroid.
(6) Use fract in the range [0.0 ... 1.0]. Set fract = 0.0
to choose the small box nearest the centroid as the pivot.
If you choose fract > 0.0, it is suggested that you call
boxaPermuteRandom() first, to permute the boxes (see usage below).
This should reduce the search time for each of the pivot boxes.
(7) Choose maxpops to be the maximum number of rectangles that
are popped from the heap. This is an indirect way to limit the
execution time. Use 0 for default (a fairly large number).
At any time, you can expect the heap to contain about
2.5 times as many boxes as have been popped off.
(8) The output result is a sorted set of overlapping
boxes, constrained by 'maxboxes', 'maxoverlap' and 'maxpops'.
(9) The main defect of the method is that it abstracts out the
actual components, retaining only the b.b. for analysis.
Consider a component with a large b.b. If this is chosen
as a pivot, all white space inside is immediately taken
out of consideration. Furthermore, even if it is never chosen
as a pivot, as the partitioning continues, at no time will
any of the whitespace inside this component be part of a
rectangle with zero overlapping boxes. Thus, the interiors
of all boxes are necessarily excluded from the union of
the returned whitespace boxes.
(10) It should be noted that the algorithm puts a large number
of partels on the queue. Setting a limit of X partels to
remove from the queue, one typically finds that there will be
several times that number (say, 2X - 3X) left on the queue.
For an efficient algorithm to find the largest white or
or black rectangles, without permitting them to overlap,
see pixFindLargeRectangles().
(11) USAGE: One way to accommodate to this weakness is to remove such
large b.b. before starting the computation. For example,
if 'box' is an input image region containing 'boxa' b.b. of c.c.:
// Faster pivot choosing
boxaPermuteRandom(boxa, boxa);
// Remove anything either large width or height
boxat = boxaSelectBySize(boxa, maxwidth, maxheight,
L_SELECT_IF_BOTH, L_SELECT_IF_LT,
NULL);
boxad = boxaGetWhiteblocks(boxat, box, type, maxboxes,
maxoverlap, maxperim, fract,
maxpops);
The result will be rectangular regions of "white space" that
extend into (and often through) the excluded components.
(11) As a simple example, suppose you wish to find the columns on a page.
First exclude large c.c. that may block the columns, and then call:
boxad = boxaGetWhiteblocks(boxa, box, L_SORT_BY_HEIGHT,
20, 0.15, 200, 0.2, 2000);
to get the 20 tallest boxes with no more than 0.15 overlap
between a box and any of the taller ones, and avoiding the
use of any c.c. with a b.b. half perimeter greater than 200
as a pivot.
Definition at line 192 of file partition.c.
References boxaGenerateSubboxes(), boxCheckIfOverlapIsBig(), L_CLONE, L_INSERT, L_SORT_BY_AREA, L_SORT_BY_HEIGHT, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_PERIMETER, L_SORT_BY_WIDTH, L_SORT_DECREASING, partelCreate(), partelDestroy(), and partelSetSize().
| [in] | boxas | sorted by size in decreasing order |
| [in] | maxoverlap | maximum fractional overlap of a box by any of the larger boxes |
Notes:
(1) This selectively removes smaller boxes when they are overlapped
by any larger box by more than the input 'maxoverlap' fraction.
(2) To avoid all pruning, use maxoverlap = 1.0. To select only
boxes that have no overlap with each other (maximal pruning),
set maxoverlap = 0.0.
(3) If there are no boxes in boxas, returns an empty boxa.
Definition at line 609 of file partition.c.
|
static |
| [in] | box | containing box; to be split by the pivot box |
| [in] | boxa | boxes of rectangles, from which 1 is to be chosen |
| [in] | maxperim | maximum half-perimeter for which pivot is selected by proximity to box centroid |
| [in] | fract | fraction of box diagonal that is an acceptable distance from the box centroid to select the pivot |
Notes:
(1) This is a tricky piece that wasn't discussed in the
Breuel's 2002 paper.
(2) Selects a box from boxa whose centroid is reasonably close to
the centroid of the containing box (xc, yc) and whose
half-perimeter does not exceed the maxperim value.
(3) If there are no boxes in the boxa that are small enough,
then it selects the smallest of the larger boxes,
without reference to its location in the containing box.
(4) If a small box has a centroid at a distance from the
centroid of the containing box that is not more than
the fraction 'fract' of the diagonal of the containing
box, that box is chosen as the pivot, terminating the
search for the nearest small box.
(5) Use fract in the range [0.0 ... 1.0]. Set fract = 0.0
to choose the small box nearest the centroid.
(6) Choose maxperim to represent a connected component that is
small enough so that you don't care about the white space
that could be inside of it.
Definition at line 478 of file partition.c.
References L_CLONE, and L_COPY.
Referenced by boxaGenerateSubboxes().
| [in] | box | to be tested |
| [in] | boxa | of boxes already stored |
| [in] | maxoverlap | maximum fractional overlap of the input box by any of the boxes in boxa |
Definition at line 556 of file partition.c.
References L_CLONE.
Referenced by boxaGetWhiteblocks().
| [in] | box | region; inserts a copy |
Definition at line 314 of file partition.c.
Referenced by boxaGetWhiteblocks().
|
static |
| [in,out] | ppartel | contents will be set to null before returning |
Definition at line 331 of file partition.c.
Referenced by boxaGetWhiteblocks().
|
static |
| [in] | partel | |
| [in] | sortflag | L_SORT_BY_WIDTH, L_SORT_BY_HEIGHT, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_PERIMETER, L_SORT_BY_AREA |
Definition at line 361 of file partition.c.
References L_SORT_BY_AREA, L_SORT_BY_HEIGHT, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_PERIMETER, and L_SORT_BY_WIDTH.
Referenced by boxaGetWhiteblocks().
|
static |
Definition at line 69 of file partition.c.