Index

A B C D E F G H I K L M N O P R S T V W 
All Classes and Interfaces|All Packages

A

asList() - Method in class com.extractpdf4j.helpers.Table
Returns an unmodifiable deep copy view of the cells.

B

BaseParser - Class in com.extractpdf4j.parsers
BaseParser
BaseParser() - Constructor for class com.extractpdf4j.parsers.BaseParser
Constructs a parser for in-memory processing.
BaseParser(String) - Constructor for class com.extractpdf4j.parsers.BaseParser
Constructs a parser for the given PDF file.
binarizeForLines(Mat) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils
Binarizes a grayscale image for line detection using adaptive thresholding.
block - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
bufferedToMat(BufferedImage) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils
Converts a BufferedImage to an OpenCV Mat (grayscale) via a temporary PNG encode/decode round‑trip.

C

cell(int, int) - Method in class com.extractpdf4j.helpers.Table
Returns the cell value at row r, column c.
com.extractpdf4j - package com.extractpdf4j
Core public APIs and foundational types for configuring and executing PDF extraction with ExtractPDF4J.
com.extractpdf4j.annotations - package com.extractpdf4j.annotations
Defines annotations used throughout ExtractPDF4J for configuration, metadata declaration, and extension points.
com.extractpdf4j.cli - package com.extractpdf4j.cli
Implements the command-line interface for executing PDF extraction workflows and interacting with ExtractPDF4J from shell environments.
com.extractpdf4j.helpers - package com.extractpdf4j.helpers
Provides reusable helper utilities supporting parsing, content normalization, validation, and shared internal operations.
com.extractpdf4j.parsers - package com.extractpdf4j.parsers
Implements the primary PDF parsing strategies and extraction components used to convert document content into structured tabular output.
com.microservice.extractpdf4j - package com.microservice.extractpdf4j
Contains the root configuration and application entry point for the ExtractPDF4J microservice runtime.
com.microservice.extractpdf4j.controller - package com.microservice.extractpdf4j.controller
Exposes HTTP endpoints for PDF upload, extraction execution, and API-based access to ExtractPDF4J capabilities.
com.microservice.extractpdf4j.service - package com.microservice.extractpdf4j.service
Implements service-layer orchestration and business logic for document processing and extraction operations in the microservice layer.
conf - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 

D

debug() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Enables debug artifact output for lattice/ocr/hybrid.
debug(boolean) - Method in class com.extractpdf4j.parsers.HybridParser
Enables or disables debug outputs for lattice/OCR strategies.
debug(boolean) - Method in class com.extractpdf4j.parsers.LatticeParser
Toggle debug overlays/artifacts.
debug(boolean) - Method in class com.extractpdf4j.parsers.OcrStreamParser
 
debugDir() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Directory where debug artifacts should be written.
debugDir(File) - Method in class com.extractpdf4j.parsers.HybridParser
Directory where debug artifacts should be written (lattice + OCR).
debugDir(File) - Method in class com.extractpdf4j.parsers.LatticeParser
Set debug artifact directory.
debugDir(File) - Method in class com.extractpdf4j.parsers.OcrStreamParser
 
dpi() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
DPI for image-based parsing (lattice/ocr/hybrid).
dpi(float) - Method in class com.extractpdf4j.parsers.HybridParser
Sets DPI for image-based parsing (used by lattice + OCR strategies).
dpi(float) - Method in class com.extractpdf4j.parsers.LatticeParser
Set rasterization DPI.
dpi(float) - Method in class com.extractpdf4j.parsers.OcrStreamParser
 

E

extractPdf(MultipartFile) - Method in class com.microservice.extractpdf4j.controller.PdfExtractController
 
ExtractPdfAnnotations - Class in com.extractpdf4j.annotations
Factory methods for creating configured parsers from ExtractPdfConfig annotations.
ExtractPdfConfig - Annotation Interface in com.extractpdf4j.annotations
Annotation-based configuration for ExtractPDF4J parsers.
extractTablesAsCsv(MultipartFile) - Method in class com.microservice.extractpdf4j.service.PdfExtractService
Asynchronously extracts tables from a given PDF file.

F

filepath - Variable in class com.extractpdf4j.parsers.BaseParser
Absolute or relative path to the PDF file being parsed.
finalizeResults(List<Table>, String) - Method in class com.extractpdf4j.parsers.BaseParser
Normalizes parser output for "no tables" situations.

G

getColBoundaries() - Method in class com.extractpdf4j.helpers.Table
 
getRowBoundaries() - Method in class com.extractpdf4j.helpers.Table
 

H

height - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
HYBRID - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
 
HybridParser - Class in com.extractpdf4j.parsers
HybridParser
HybridParser() - Constructor for class com.extractpdf4j.parsers.HybridParser
Creates a HybridParser for in-memory processing.
HybridParser(String) - Constructor for class com.extractpdf4j.parsers.HybridParser
Creates a HybridParser for the given PDF file path.

I

ImagePdfUtils - Class in com.extractpdf4j.helpers
ImagePdfUtils

K

keepCells() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Whether to keep empty cells in lattice parsing.
keepCells(boolean) - Method in class com.extractpdf4j.parsers.HybridParser
Whether to preserve empty cells when reconstructing grids (lattice only).
keepCells(boolean) - Method in class com.extractpdf4j.parsers.LatticeParser
Keep empty cells in the final grid (useful for fixed layouts).

L

LATTICE - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
 
LatticeParser - Class in com.extractpdf4j.parsers
LatticeParser
LatticeParser() - Constructor for class com.extractpdf4j.parsers.LatticeParser
Creates a LatticeParser for in-memory processing.
LatticeParser(String) - Constructor for class com.extractpdf4j.parsers.LatticeParser
 
left - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
line - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 

M

main(String[]) - Static method in class com.extractpdf4j.cli.Main
Program entry point.
main(String[]) - Static method in class com.microservice.extractpdf4j.Main
 
Main - Class in com.extractpdf4j.cli
Main
Main - Class in com.microservice.extractpdf4j
 
Main() - Constructor for class com.extractpdf4j.cli.Main
 
Main() - Constructor for class com.microservice.extractpdf4j.Main
 
minScore() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Minimum average score for hybrid parser selection.
minScore(double) - Method in class com.extractpdf4j.parsers.HybridParser
Sets the minimum allowed average score across a list of tables.

N

ncols() - Method in class com.extractpdf4j.helpers.Table
Number of columns in the table (0 if there are no rows).
nrows() - Method in class com.extractpdf4j.helpers.Table
Number of rows in the table.

O

Ocr - Class in com.extractpdf4j.helpers
OCR helper utilities.
Ocr.OcrWord - Class in com.extractpdf4j.helpers
 
ocrPng(String) - Static method in class com.extractpdf4j.helpers.Ocr
Runs OCR on a PNG file and returns plain text.
OCRSTREAM - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
 
OcrStreamParser - Class in com.extractpdf4j.parsers
OcrStreamParser (header-aware): - Removes horizontal *and* vertical rules before OCR.
OcrStreamParser() - Constructor for class com.extractpdf4j.parsers.OcrStreamParser
Creates an OcrStreamParser for in-memory processing.
OcrStreamParser(String) - Constructor for class com.extractpdf4j.parsers.OcrStreamParser
 
ocrTsv(String) - Static method in class com.extractpdf4j.helpers.Ocr
 
ocrTsv(String, String, String) - Static method in class com.extractpdf4j.helpers.Ocr
 
ocrTsvHeuristically(String, String) - Static method in class com.extractpdf4j.helpers.Ocr
Runs OCR on a PNG image using a heuristic to find the best Page Segmentation Mode (PSM).
OcrWord(int, int, int, int, int, String, int, int, int, int) - Constructor for class com.extractpdf4j.helpers.Ocr.OcrWord
 

P

PageRange - Class in com.extractpdf4j.helpers
PageRange
pages - Variable in class com.extractpdf4j.parsers.BaseParser
Page selection string, defaulting to "1".
pages() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Page selection string (e.g., "all", "1", "2-5", "1,3-4").
pages(String) - Method in class com.extractpdf4j.parsers.BaseParser
Sets the pages to parse.
pages(String) - Method in class com.extractpdf4j.parsers.HybridParser
Sets the page selection for this parser and propagates the same selection to all underlying strategies.
par - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
parse() - Method in class com.extractpdf4j.parsers.BaseParser
Parses the configured pages from the PDF file.
parse(String) - Static method in class com.extractpdf4j.helpers.PageRange
 
parse(PDDocument) - Method in class com.extractpdf4j.parsers.BaseParser
Parses a previously loaded PDF document.
parse(PDDocument) - Method in class com.extractpdf4j.parsers.HybridParser
 
parse(PDDocument) - Method in class com.extractpdf4j.parsers.LatticeParser
 
parse(PDDocument) - Method in class com.extractpdf4j.parsers.OcrStreamParser
 
parse(PDDocument) - Method in class com.extractpdf4j.parsers.StreamParser
 
parsePage(int) - Method in class com.extractpdf4j.parsers.BaseParser
Parses a single page or the entire document.
parsePage(int) - Method in class com.extractpdf4j.parsers.HybridParser
Runs stream, lattice, and OCR-backed stream for the requested page(s) and returns the best-scoring set of tables.
parsePage(int) - Method in class com.extractpdf4j.parsers.LatticeParser
Deprecated.
This method loads the document from disk on every call. Prefer loading the PDDocument once and using LatticeParser.parse(PDDocument).
parsePage(int) - Method in class com.extractpdf4j.parsers.OcrStreamParser
Deprecated.
This method loads the document from disk on every call. Prefer loading the PDDocument once and using OcrStreamParser.parse(PDDocument).
parsePage(int) - Method in class com.extractpdf4j.parsers.StreamParser
Deprecated.
This method loads the document from disk on every call. Prefer loading the PDDocument once and using StreamParser.parse(PDDocument).
parser() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Parser strategy to use when materializing a parser.
parserFrom(Class<?>) - Static method in class com.extractpdf4j.annotations.ExtractPdfAnnotations
Builds a parser instance (no filepath) from the ExtractPdfConfig annotation on a class.
parserFrom(Class<?>, String) - Static method in class com.extractpdf4j.annotations.ExtractPdfAnnotations
Builds a parser instance from the ExtractPdfConfig annotation on a class.
ParserMode - Enum Class in com.extractpdf4j.annotations
Parser modes supported by ExtractPDF4J.
PdfExtractController - Class in com.microservice.extractpdf4j.controller
 
PdfExtractController(PdfExtractService) - Constructor for class com.microservice.extractpdf4j.controller.PdfExtractController
 
PdfExtractService - Class in com.microservice.extractpdf4j.service
 
PdfExtractService() - Constructor for class com.microservice.extractpdf4j.service.PdfExtractService
 

R

renderPage(PDDocument, int, float) - Static method in class com.extractpdf4j.helpers.ImagePdfUtils
Renders a single PDF page to a BufferedImage at the requested DPI.
requiredHeaders() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Required OCR headers to look for before returning results.
requiredHeaders(List<String>) - Method in class com.extractpdf4j.parsers.OcrStreamParser
 
run() - Method in class com.extractpdf4j.cli.Main
 

S

setCell(int, int, String) - Method in class com.extractpdf4j.helpers.Table
Mutates the cell at row r, column c with value v.
STREAM - Enum constant in enum class com.extractpdf4j.annotations.ParserMode
 
StreamParser - Class in com.extractpdf4j.parsers
StreamParser
StreamParser() - Constructor for class com.extractpdf4j.parsers.StreamParser
Creates a StreamParser for in-memory processing.
StreamParser(String) - Constructor for class com.extractpdf4j.parsers.StreamParser
 
stripText - Variable in class com.extractpdf4j.parsers.BaseParser
Whether to normalize/strip text (e.g., trim, collapse whitespace) in stream-based extraction.
stripText() - Element in annotation interface com.extractpdf4j.annotations.ExtractPdfConfig
Whether to strip/normalize text for stream-based extraction.
stripText(boolean) - Method in class com.extractpdf4j.parsers.BaseParser
Enables or disables text normalization for stream-style extraction.
stripText(boolean) - Method in class com.extractpdf4j.parsers.HybridParser
Enables or disables text normalization for stream-style extraction across all underlying strategies.

T

Table - Class in com.extractpdf4j.helpers
Table
Table(List<List<String>>, List<Double>, List<Double>) - Constructor for class com.extractpdf4j.helpers.Table
 
text - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
toCSV(char) - Method in class com.extractpdf4j.helpers.Table
Serializes the table to CSV using the given separator.
top - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 

V

valueOf(String) - Static method in enum class com.extractpdf4j.annotations.ParserMode
Returns the enum constant of this class with the specified name.
values() - Static method in enum class com.extractpdf4j.annotations.ParserMode
Returns an array containing the constants of this enum class, in the order they are declared.

W

width - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
word - Variable in class com.extractpdf4j.helpers.Ocr.OcrWord
 
A B C D E F G H I K L M N O P R S T V W 
All Classes and Interfaces|All Packages