Index (ExtractPDF4J 2.1.0 API)

A B C D E F G H I K L M N O P R S T V W
All Classes and Interfaces|All Packages

A

asList() - Method in class com.extractpdf4j.helpers.Table: Returns an unmodifiable deep copy view of the cells.

B

BaseParser - Class in com.extractpdf4j.parsers: BaseParser
BaseParser() - Constructor for class com.extractpdf4j.parsers.BaseParser: Constructs a parser for in-memory processing.
BaseParser(String) - Constructor for class com.extractpdf4j.parsers.BaseParser: Constructs a parser for the given PDF file.
binarizeForLines(Mat) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils: Binarizes a grayscale image for line detection using adaptive thresholding.
block - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
bufferedToMat(BufferedImage) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils: Converts a BufferedImage to an OpenCV Mat (grayscale) via a temporary PNG encode/decode round‑trip.

C

cell(int, int) - Method in class com.extractpdf4j.helpers.Table: Returns the cell value at row r, column c.
com.extractpdf4j - package com.extractpdf4j: Core public APIs and foundational types for configuring and executing PDF extraction with ExtractPDF4J.
com.extractpdf4j.annotations - package com.extractpdf4j.annotations: Defines annotations used throughout ExtractPDF4J for configuration, metadata declaration, and extension points.
com.extractpdf4j.cli - package com.extractpdf4j.cli: Implements the command-line interface for executing PDF extraction workflows and interacting with ExtractPDF4J from shell environments.
com.extractpdf4j.helpers - package com.extractpdf4j.helpers: Provides reusable helper utilities supporting parsing, content normalization, validation, and shared internal operations.
com.extractpdf4j.parsers - package com.extractpdf4j.parsers: Implements the primary PDF parsing strategies and extraction components used to convert document content into structured tabular output.
com.microservice.extractpdf4j - package com.microservice.extractpdf4j: Contains the root configuration and application entry point for the ExtractPDF4J microservice runtime.
com.microservice.extractpdf4j.controller - package com.microservice.extractpdf4j.controller: Exposes HTTP endpoints for PDF upload, extraction execution, and API-based access to ExtractPDF4J capabilities.
com.microservice.extractpdf4j.service - package com.microservice.extractpdf4j.service: Implements service-layer orchestration and business logic for document processing and extraction operations in the microservice layer.
conf - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
CsvExporter - Class in com.extractpdf4j.helpers: CsvExporter
CsvExporter() - Constructor for class com.extractpdf4j.helpers.CsvExporter

D

debug() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Enables debug artifact output for lattice/ocr/hybrid.
debug(boolean) - Method in class com.extractpdf4j.parsers.HybridParser: Enables or disables debug outputs for lattice/OCR strategies.
debug(boolean) - Method in class com.extractpdf4j.parsers.LatticeParser: Toggle debug overlays/artifacts.
debug(boolean) - Method in class com.extractpdf4j.parsers.OcrStreamParser
debugDir() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Directory where debug artifacts should be written.
debugDir(File) - Method in class com.extractpdf4j.parsers.HybridParser: Directory where debug artifacts should be written (lattice + OCR).
debugDir(File) - Method in class com.extractpdf4j.parsers.LatticeParser: Set debug artifact directory.
debugDir(File) - Method in class com.extractpdf4j.parsers.OcrStreamParser
dpi() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: DPI for image-based parsing (lattice/ocr/hybrid).
dpi(float) - Method in class com.extractpdf4j.parsers.HybridParser: Sets DPI for image-based parsing (used by lattice + OCR strategies).
dpi(float) - Method in class com.extractpdf4j.parsers.LatticeParser: Set rasterization DPI.
dpi(float) - Method in class com.extractpdf4j.parsers.OcrStreamParser

E

export(List<Table>) - Method in class com.extractpdf4j.helpers.CsvExporter
extractPdf(MultipartFile) - Method in class com.microservice.extractpdf4j.controller.PdfExtractController
ExtractPdfAnnotations - Class in com.extractpdf4j.annotations: Factory methods for creating configured parsers from ExtractPdfConfig annotations.
ExtractPdfConfig - Annotation Interface in com.extractpdf4j.annotations: Annotation-based configuration for ExtractPDF4J parsers.
extractTablesAsCsv(MultipartFile) - Method in class com.microservice.extractpdf4j.service.PdfExtractService: Asynchronously extracts tables from a given PDF file.

F

filepath - Variable in class com.extractpdf4j.parsers.BaseParser: Absolute or relative path to the PDF file being parsed.
finalizeResults(List<Table>, String) - Method in class com.extractpdf4j.parsers.BaseParser: Normalizes parser output for "no tables" situations.

G

getColBoundaries() - Method in class com.extractpdf4j.helpers.Table
getRowBoundaries() - Method in class com.extractpdf4j.helpers.Table

H

height - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
HYBRID - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
HybridParser - Class in com.extractpdf4j.parsers: HybridParser
HybridParser() - Constructor for class com.extractpdf4j.parsers.HybridParser: Creates a HybridParser for in-memory processing.
HybridParser(String) - Constructor for class com.extractpdf4j.parsers.HybridParser: Creates a HybridParser for the given PDF file path.

I

ImagePdfUtils - Class in com.extractpdf4j.helpers: ImagePdfUtils

K

keepCells() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Whether to keep empty cells in lattice parsing.
keepCells(boolean) - Method in class com.extractpdf4j.parsers.HybridParser: Whether to preserve empty cells when reconstructing grids (lattice only).
keepCells(boolean) - Method in class com.extractpdf4j.parsers.LatticeParser: Keep empty cells in the final grid (useful for fixed layouts).

L

LATTICE - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
LatticeParser - Class in com.extractpdf4j.parsers: LatticeParser
LatticeParser() - Constructor for class com.extractpdf4j.parsers.LatticeParser: Creates a LatticeParser for in-memory processing.
LatticeParser(String) - Constructor for class com.extractpdf4j.parsers.LatticeParser
left - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
line - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord

M

main(String[]) - Static method in class com.extractpdf4j.cli.Main: Program entry point.
main(String[]) - Static method in class com.microservice.extractpdf4j.Main
Main - Class in com.extractpdf4j.cli: Main
Main - Class in com.microservice.extractpdf4j
Main() - Constructor for class com.extractpdf4j.cli.Main
Main() - Constructor for class com.microservice.extractpdf4j.Main
minScore() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Minimum average score for hybrid parser selection.
minScore(double) - Method in class com.extractpdf4j.parsers.HybridParser: Sets the minimum allowed average score across a list of tables.

N

ncols() - Method in class com.extractpdf4j.helpers.Table: Number of columns in the table (0 if there are no rows).
nrows() - Method in class com.extractpdf4j.helpers.Table: Number of rows in the table.

O

Ocr - Class in com.extractpdf4j.helpers: OCR helper utilities.
Ocr.OcrWord - Class in com.extractpdf4j.helpers
ocrPng(String) - Static method in class com.extractpdf4j.helpers.Ocr: Runs OCR on a PNG file and returns plain text.
OCRSTREAM - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
OcrStreamParser - Class in com.extractpdf4j.parsers: OcrStreamParser (header-aware): - Removes horizontal *and* vertical rules before OCR.
OcrStreamParser() - Constructor for class com.extractpdf4j.parsers.OcrStreamParser: Creates an OcrStreamParser for in-memory processing.
OcrStreamParser(String) - Constructor for class com.extractpdf4j.parsers.OcrStreamParser
ocrTsv(String) - Static method in class com.extractpdf4j.helpers.Ocr
ocrTsv(String, String, String) - Static method in class com.extractpdf4j.helpers.Ocr
ocrTsvHeuristically(String, String) - Static method in class com.extractpdf4j.helpers.Ocr: Runs OCR on a PNG image using a heuristic to find the best Page Segmentation Mode (PSM).
OcrWord(int, int, int, int, int, String, int, int, int, int) - Constructor for class com.extractpdf4j.helpers.Ocr.OcrWord

P

PageRange - Class in com.extractpdf4j.helpers: PageRange
pages - Variable in class com.extractpdf4j.parsers.BaseParser: Page selection string, defaulting to "1".
pages() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Page selection string (e.g., "all", "1", "2-5", "1,3-4").
pages(String) - Method in class com.extractpdf4j.parsers.BaseParser: Sets the pages to parse.
pages(String) - Method in class com.extractpdf4j.parsers.HybridParser: Sets the page selection for this parser and propagates the same selection to all underlying strategies.
par - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
parse() - Method in class com.extractpdf4j.parsers.BaseParser: Parses the configured pages from the PDF file.
parse(String) - Static method in class com.extractpdf4j.helpers.PageRange
parse(PDDocument) - Method in class com.extractpdf4j.parsers.BaseParser: Parses a previously loaded PDF document.
parse(PDDocument) - Method in class com.extractpdf4j.parsers.HybridParser
parse(PDDocument) - Method in class com.extractpdf4j.parsers.LatticeParser
parse(PDDocument) - Method in class com.extractpdf4j.parsers.OcrStreamParser
parse(PDDocument) - Method in class com.extractpdf4j.parsers.StreamParser
parsePage(int) - Method in class com.extractpdf4j.parsers.BaseParser: Parses a single page or the entire document.
parsePage(int) - Method in class com.extractpdf4j.parsers.HybridParser: Runs stream, lattice, and OCR-backed stream for the requested page(s) and returns the best-scoring set of tables.
parsePage(int) - Method in class com.extractpdf4j.parsers.LatticeParser: Deprecated.
This method loads the document from disk on every call. Prefer loading the PDDocument once and using LatticeParser.parse(PDDocument).
parsePage(int) - Method in class com.extractpdf4j.parsers.OcrStreamParser: Deprecated.
This method loads the document from disk on every call. Prefer loading the PDDocument once and using OcrStreamParser.parse(PDDocument).
parsePage(int) - Method in class com.extractpdf4j.parsers.StreamParser: Deprecated.
This method loads the document from disk on every call. Prefer loading the PDDocument once and using StreamParser.parse(PDDocument).
parser() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Parser strategy to use when materializing a parser.
parserFrom(Class<?>) - Static method in class com.extractpdf4j.annotations.ExtractPdfAnnotations: Builds a parser instance (no filepath) from the ExtractPdfConfig annotation on a class.
parserFrom(Class<?>, String) - Static method in class com.extractpdf4j.annotations.ExtractPdfAnnotations: Builds a parser instance from the ExtractPdfConfig annotation on a class.
ParserMode - Enum Class in com.extractpdf4j.annotations: Parser modes supported by ExtractPDF4J.
PdfExtractController - Class in com.microservice.extractpdf4j.controller
PdfExtractController(PdfExtractService) - Constructor for class com.microservice.extractpdf4j.controller.PdfExtractController
PdfExtractService - Class in com.microservice.extractpdf4j.service
PdfExtractService() - Constructor for class com.microservice.extractpdf4j.service.PdfExtractService

R

renderPage(PDDocument, int, float) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils: Renders a single PDF page to a BufferedImage at the requested DPI.
requiredHeaders() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Required OCR headers to look for before returning results.
requiredHeaders(List<String>) - Method in class com.extractpdf4j.parsers.OcrStreamParser
run() - Method in class com.extractpdf4j.cli.Main

S

setCell(int, int, String) - Method in class com.extractpdf4j.helpers.Table: Mutates the cell at row r, column c with value v.
setDelimiter(String) - Method in class com.extractpdf4j.helpers.CsvExporter
STREAM - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
StreamParser - Class in com.extractpdf4j.parsers: StreamParser
StreamParser() - Constructor for class com.extractpdf4j.parsers.StreamParser: Creates a StreamParser for in-memory processing.
StreamParser(String) - Constructor for class com.extractpdf4j.parsers.StreamParser
stripText - Variable in class com.extractpdf4j.parsers.BaseParser: Whether to normalize/strip text (e.g., trim, collapse whitespace) in stream-based extraction.
stripText() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig: Whether to strip/normalize text for stream-based extraction.
stripText(boolean) - Method in class com.extractpdf4j.parsers.BaseParser: Enables or disables text normalization for stream-style extraction.
stripText(boolean) - Method in class com.extractpdf4j.parsers.HybridParser: Enables or disables text normalization for stream-style extraction across all underlying strategies.

T

Table - Class in com.extractpdf4j.helpers: Table
Table(List<List<String>>, List<Double>, List<Double>) - Constructor for class com.extractpdf4j.helpers.Table
text - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
toCSV(char) - Method in class com.extractpdf4j.helpers.Table: Serializes the table to CSV using the given separator.
top - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord

V

valueOf(String) - Static method in enum class com.extractpdf4j.annotations.ParserMode: Returns the enum constant of this class with the specified name.
values() - Static method in enum class com.extractpdf4j.annotations.ParserMode: Returns an array containing the constants of this enum class, in the order they are declared.

W

width - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
word - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord

A B C D E F G H I K L M N O P R S T V W
All Classes and Interfaces|All Packages