Index
All Classes and Interfaces|All Packages
A
- asList() - Method in class com.extractpdf4j.helpers.Table
-
Returns an unmodifiable deep copy view of the cells.
B
- BaseParser - Class in com.extractpdf4j.parsers
-
BaseParser
- BaseParser() - Constructor for class com.extractpdf4j.parsers.BaseParser
-
Constructs a parser for in-memory processing.
- BaseParser(String) - Constructor for class com.extractpdf4j.parsers.BaseParser
-
Constructs a parser for the given PDF file.
- binarizeForLines(Mat) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils
-
Binarizes a grayscale image for line detection using adaptive thresholding.
- block - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
- bufferedToMat(BufferedImage) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils
-
Converts a
BufferedImageto an OpenCVMat(grayscale) via a temporary PNG encode/decode round‑trip.
C
- cell(int, int) - Method in class com.extractpdf4j.helpers.Table
-
Returns the cell value at row
r, columnc. - com.extractpdf4j - package com.extractpdf4j
-
Core public APIs and foundational types for configuring and executing PDF extraction with ExtractPDF4J.
- com.extractpdf4j.annotations - package com.extractpdf4j.annotations
-
Defines annotations used throughout ExtractPDF4J for configuration, metadata declaration, and extension points.
- com.extractpdf4j.cli - package com.extractpdf4j.cli
-
Implements the command-line interface for executing PDF extraction workflows and interacting with ExtractPDF4J from shell environments.
- com.extractpdf4j.helpers - package com.extractpdf4j.helpers
-
Provides reusable helper utilities supporting parsing, content normalization, validation, and shared internal operations.
- com.extractpdf4j.parsers - package com.extractpdf4j.parsers
-
Implements the primary PDF parsing strategies and extraction components used to convert document content into structured tabular output.
- com.microservice.extractpdf4j - package com.microservice.extractpdf4j
-
Contains the root configuration and application entry point for the ExtractPDF4J microservice runtime.
- com.microservice.extractpdf4j.controller - package com.microservice.extractpdf4j.controller
-
Exposes HTTP endpoints for PDF upload, extraction execution, and API-based access to ExtractPDF4J capabilities.
- com.microservice.extractpdf4j.service - package com.microservice.extractpdf4j.service
-
Implements service-layer orchestration and business logic for document processing and extraction operations in the microservice layer.
- conf - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
D
- debug() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Enables debug artifact output for lattice/ocr/hybrid.
- debug(boolean) - Method in class com.extractpdf4j.parsers.HybridParser
-
Enables or disables debug outputs for lattice/OCR strategies.
- debug(boolean) - Method in class com.extractpdf4j.parsers.LatticeParser
-
Toggle debug overlays/artifacts.
- debug(boolean) - Method in class com.extractpdf4j.parsers.OcrStreamParser
- debugDir() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Directory where debug artifacts should be written.
- debugDir(File) - Method in class com.extractpdf4j.parsers.HybridParser
-
Directory where debug artifacts should be written (lattice + OCR).
- debugDir(File) - Method in class com.extractpdf4j.parsers.LatticeParser
-
Set debug artifact directory.
- debugDir(File) - Method in class com.extractpdf4j.parsers.OcrStreamParser
- dpi() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
DPI for image-based parsing (lattice/ocr/hybrid).
- dpi(float) - Method in class com.extractpdf4j.parsers.HybridParser
-
Sets DPI for image-based parsing (used by lattice + OCR strategies).
- dpi(float) - Method in class com.extractpdf4j.parsers.LatticeParser
-
Set rasterization DPI.
- dpi(float) - Method in class com.extractpdf4j.parsers.OcrStreamParser
E
- extractPdf(MultipartFile) - Method in class com.microservice.extractpdf4j.controller.PdfExtractController
- ExtractPdfAnnotations - Class in com.extractpdf4j.annotations
-
Factory methods for creating configured parsers from
ExtractPdfConfigannotations. - ExtractPdfConfig - Annotation Interface in com.extractpdf4j.annotations
-
Annotation-based configuration for ExtractPDF4J parsers.
- extractTablesAsCsv(MultipartFile) - Method in class com.microservice.extractpdf4j.service.PdfExtractService
-
Asynchronously extracts tables from a given PDF file.
F
- filepath - Variable in class com.extractpdf4j.parsers.BaseParser
-
Absolute or relative path to the PDF file being parsed.
- finalizeResults(List<Table>, String) - Method in class com.extractpdf4j.parsers.BaseParser
-
Normalizes parser output for "no tables" situations.
G
- getColBoundaries() - Method in class com.extractpdf4j.helpers.Table
- getRowBoundaries() - Method in class com.extractpdf4j.helpers.Table
H
- height - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
- HYBRID - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
- HybridParser - Class in com.extractpdf4j.parsers
-
HybridParser
- HybridParser() - Constructor for class com.extractpdf4j.parsers.HybridParser
-
Creates a
HybridParserfor in-memory processing. - HybridParser(String) - Constructor for class com.extractpdf4j.parsers.HybridParser
-
Creates a
HybridParserfor the given PDF file path.
I
- ImagePdfUtils - Class in com.extractpdf4j.helpers
-
ImagePdfUtils
K
- keepCells() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Whether to keep empty cells in lattice parsing.
- keepCells(boolean) - Method in class com.extractpdf4j.parsers.HybridParser
-
Whether to preserve empty cells when reconstructing grids (lattice only).
- keepCells(boolean) - Method in class com.extractpdf4j.parsers.LatticeParser
-
Keep empty cells in the final grid (useful for fixed layouts).
L
- LATTICE - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
- LatticeParser - Class in com.extractpdf4j.parsers
-
LatticeParser
- LatticeParser() - Constructor for class com.extractpdf4j.parsers.LatticeParser
-
Creates a
LatticeParserfor in-memory processing. - LatticeParser(String) - Constructor for class com.extractpdf4j.parsers.LatticeParser
- left - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
- line - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
M
- main(String[]) - Static method in class com.extractpdf4j.cli.Main
-
Program entry point.
- main(String[]) - Static method in class com.microservice.extractpdf4j.Main
- Main - Class in com.extractpdf4j.cli
-
Main
- Main - Class in com.microservice.extractpdf4j
- Main() - Constructor for class com.extractpdf4j.cli.Main
- Main() - Constructor for class com.microservice.extractpdf4j.Main
- minScore() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Minimum average score for hybrid parser selection.
- minScore(double) - Method in class com.extractpdf4j.parsers.HybridParser
-
Sets the minimum allowed average score across a list of tables.
N
- ncols() - Method in class com.extractpdf4j.helpers.Table
-
Number of columns in the table (0 if there are no rows).
- nrows() - Method in class com.extractpdf4j.helpers.Table
-
Number of rows in the table.
O
- Ocr - Class in com.extractpdf4j.helpers
-
OCR helper utilities.
- Ocr.OcrWord - Class in com.extractpdf4j.helpers
- ocrPng(String) - Static method in class com.extractpdf4j.helpers.Ocr
-
Runs OCR on a PNG file and returns plain text.
- OCRSTREAM - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
- OcrStreamParser - Class in com.extractpdf4j.parsers
-
OcrStreamParser (header-aware): - Removes horizontal *and* vertical rules before OCR.
- OcrStreamParser() - Constructor for class com.extractpdf4j.parsers.OcrStreamParser
-
Creates an
OcrStreamParserfor in-memory processing. - OcrStreamParser(String) - Constructor for class com.extractpdf4j.parsers.OcrStreamParser
- ocrTsv(String) - Static method in class com.extractpdf4j.helpers.Ocr
- ocrTsv(String, String, String) - Static method in class com.extractpdf4j.helpers.Ocr
- ocrTsvHeuristically(String, String) - Static method in class com.extractpdf4j.helpers.Ocr
-
Runs OCR on a PNG image using a heuristic to find the best Page Segmentation Mode (PSM).
- OcrWord(int, int, int, int, int, String, int, int, int, int) - Constructor for class com.extractpdf4j.helpers.Ocr.OcrWord
P
- PageRange - Class in com.extractpdf4j.helpers
-
PageRange
- pages - Variable in class com.extractpdf4j.parsers.BaseParser
-
Page selection string, defaulting to
"1". - pages() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Page selection string (e.g., "all", "1", "2-5", "1,3-4").
- pages(String) - Method in class com.extractpdf4j.parsers.BaseParser
-
Sets the pages to parse.
- pages(String) - Method in class com.extractpdf4j.parsers.HybridParser
-
Sets the page selection for this parser and propagates the same selection to all underlying strategies.
- par - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
- parse() - Method in class com.extractpdf4j.parsers.BaseParser
-
Parses the configured pages from the PDF file.
- parse(String) - Static method in class com.extractpdf4j.helpers.PageRange
- parse(PDDocument) - Method in class com.extractpdf4j.parsers.BaseParser
-
Parses a previously loaded PDF document.
- parse(PDDocument) - Method in class com.extractpdf4j.parsers.HybridParser
- parse(PDDocument) - Method in class com.extractpdf4j.parsers.LatticeParser
- parse(PDDocument) - Method in class com.extractpdf4j.parsers.OcrStreamParser
- parse(PDDocument) - Method in class com.extractpdf4j.parsers.StreamParser
- parsePage(int) - Method in class com.extractpdf4j.parsers.BaseParser
-
Parses a single page or the entire document.
- parsePage(int) - Method in class com.extractpdf4j.parsers.HybridParser
-
Runs stream, lattice, and OCR-backed stream for the requested page(s) and returns the best-scoring set of tables.
- parsePage(int) - Method in class com.extractpdf4j.parsers.LatticeParser
-
Deprecated.
- parsePage(int) - Method in class com.extractpdf4j.parsers.OcrStreamParser
-
Deprecated.This method loads the document from disk on every call. Prefer loading the PDDocument once and using
OcrStreamParser.parse(PDDocument). - parsePage(int) - Method in class com.extractpdf4j.parsers.StreamParser
-
Deprecated.This method loads the document from disk on every call. Prefer loading the PDDocument once and using
StreamParser.parse(PDDocument). - parser() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Parser strategy to use when materializing a parser.
- parserFrom(Class<?>) - Static method in class com.extractpdf4j.annotations.ExtractPdfAnnotations
-
Builds a parser instance (no filepath) from the
ExtractPdfConfigannotation on a class. - parserFrom(Class<?>, String) - Static method in class com.extractpdf4j.annotations.ExtractPdfAnnotations
-
Builds a parser instance from the
ExtractPdfConfigannotation on a class. - ParserMode - Enum Class in com.extractpdf4j.annotations
-
Parser modes supported by ExtractPDF4J.
- PdfExtractController - Class in com.microservice.extractpdf4j.controller
- PdfExtractController(PdfExtractService) - Constructor for class com.microservice.extractpdf4j.controller.PdfExtractController
- PdfExtractService - Class in com.microservice.extractpdf4j.service
- PdfExtractService() - Constructor for class com.microservice.extractpdf4j.service.PdfExtractService
R
- renderPage(PDDocument, int, float) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils
-
Renders a single PDF page to a
BufferedImageat the requested DPI. - requiredHeaders() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Required OCR headers to look for before returning results.
- requiredHeaders(List<String>) - Method in class com.extractpdf4j.parsers.OcrStreamParser
- run() - Method in class com.extractpdf4j.cli.Main
S
- setCell(int, int, String) - Method in class com.extractpdf4j.helpers.Table
-
Mutates the cell at row
r, columncwith valuev. - STREAM - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
- StreamParser - Class in com.extractpdf4j.parsers
-
StreamParser
- StreamParser() - Constructor for class com.extractpdf4j.parsers.StreamParser
-
Creates a
StreamParserfor in-memory processing. - StreamParser(String) - Constructor for class com.extractpdf4j.parsers.StreamParser
- stripText - Variable in class com.extractpdf4j.parsers.BaseParser
-
Whether to normalize/strip text (e.g., trim, collapse whitespace) in stream-based extraction.
- stripText() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
-
Whether to strip/normalize text for stream-based extraction.
- stripText(boolean) - Method in class com.extractpdf4j.parsers.BaseParser
-
Enables or disables text normalization for stream-style extraction.
- stripText(boolean) - Method in class com.extractpdf4j.parsers.HybridParser
-
Enables or disables text normalization for stream-style extraction across all underlying strategies.
T
- Table - Class in com.extractpdf4j.helpers
-
Table
- Table(List<List<String>>, List<Double>, List<Double>) - Constructor for class com.extractpdf4j.helpers.Table
- text - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
- toCSV(char) - Method in class com.extractpdf4j.helpers.Table
-
Serializes the table to CSV using the given separator.
- top - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
V
- valueOf(String) - Static method in enum class com.extractpdf4j.annotations.ParserMode
-
Returns the enum constant of this class with the specified name.
- values() - Static method in enum class com.extractpdf4j.annotations.ParserMode
-
Returns an array containing the constants of this enum class, in the order they are declared.
W
- width - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
- word - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
All Classes and Interfaces|All Packages
LatticeParser.parse(PDDocument).