com.extractpdf4j.parsers.LatticeParser

public class LatticeParser extends BaseParser

LatticeParser

Detects table structure by rasterizing pages and finding horizontal/vertical ruling lines with OpenCV. Reconstructs a cell grid, maps PDF text into cells, and optionally runs OCR for sparsely filled cells.

Pipeline

Render page to image at renderDpi.
Binarize for line detection (adaptive threshold).
Extract horizontal/vertical lines via morphology; project to get line positions.
Build grid from line intersections; map PDF glyphs to cell coords.
Fallback OCR for cells if text coverage is low.
Emit Table with grid + row/column boundaries.

Page indexing follows the BaseParser convention: this class expects parsePage(1) for the first page; parsePage(-1) means “all pages”.

Field Summary

Fields inherited from class com.extractpdf4j.parsers.BaseParser
filepath, pages, stripText
Constructor Summary

Constructors

Constructor

Description

LatticeParser()

Creates a LatticeParser for in-memory processing.

LatticeParser(String filepath)
Method Summary

Modifier and Type

Method

Description

LatticeParser

debug(boolean on)

Toggle debug overlays/artifacts.

LatticeParser

debugDir(File dir)

Set debug artifact directory.

LatticeParser

dpi(float dpi)

Set rasterization DPI.

LatticeParser

keepCells(boolean on)

Keep empty cells in the final grid (useful for fixed layouts).

List<Table>

parse(org.apache.pdfbox.pdmodel.PDDocument document)

Parses a previously loaded PDF document.

protected List<Table>

parsePage(int page)

Deprecated.
This method loads the document from disk on every call.

Methods inherited from class com.extractpdf4j.parsers.BaseParser
finalizeResults, pages, parse, stripText

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- LatticeParser
  
  public LatticeParser(String filepath)
- LatticeParser
  
  public LatticeParser()
  
  Creates a LatticeParser for in-memory processing. The PDF document must be passed to the parse() method.
Method Details
- debug
  
  public LatticeParser debug(boolean on)
  
  Toggle debug overlays/artifacts.
- keepCells
  
  public LatticeParser keepCells(boolean on)
  
  Keep empty cells in the final grid (useful for fixed layouts).
- dpi
  
  public LatticeParser dpi(float dpi)
  
  Set rasterization DPI.
- debugDir
  
  public LatticeParser debugDir(File dir)
  
  Set debug artifact directory.
- parsePage
  
  @Deprecated protected List<Table> parsePage(int page) throws IOException
  
  Deprecated.
  This method loads the document from disk on every call. Prefer loading the PDDocument once and using parse(PDDocument).
  
  Description copied from class: BaseParser
  
  Parses a single page or the entire document.
  Contract: If page == -1, the implementation must parse the entire document. For any non-negative value, the implementation must parse only the specified page index (1-based or 0-based is implementation-defined, but should be consistent across the codebase and documented in concrete classes).
  
  Specified by:
  
  parsePage in class BaseParser
  
  Parameters:
  
  page - page index to parse, or -1 to parse all pages
  
  Returns:
  
  a list of Table objects extracted from the requested page(s) (possibly empty)
  
  Throws:
  
  IOException - if an error occurs while parsing
- parse
  
  public List<Table> parse(org.apache.pdfbox.pdmodel.PDDocument document) throws IOException
  
  Description copied from class: BaseParser
  
  Parses a previously loaded PDF document. This is the preferred method for in-memory processing.
  
  Specified by:
  
  parse in class BaseParser
  
  Parameters:
  
  document - The PDDocument to parse.
  
  Returns:
  
  A list of extracted tables.
  
  Throws:
  
  IOException - for I/O issues during parsing.

Class LatticeParser

Pipeline

Field Summary

Fields inherited from class com.extractpdf4j.parsers.BaseParser

Constructor Summary

Method Summary

Methods inherited from class com.extractpdf4j.parsers.BaseParser

Methods inherited from class java.lang.Object

Constructor Details

LatticeParser

LatticeParser

Method Details

debug

keepCells

dpi

debugDir

parsePage

parse