com.extractpdf4j.parsers (ExtractPDF4J 2.1.0 API)

package com.extractpdf4j.parsers

Implements the primary PDF parsing strategies and extraction components used to convert document content into structured tabular output.

Related Packages

Package

Description

com.extractpdf4j

Core public APIs and foundational types for configuring and executing PDF extraction with ExtractPDF4J.

com.extractpdf4j.annotations

Defines annotations used throughout ExtractPDF4J for configuration, metadata declaration, and extension points.

com.extractpdf4j.cli

Implements the command-line interface for executing PDF extraction workflows and interacting with ExtractPDF4J from shell environments.

com.extractpdf4j.helpers

Provides reusable helper utilities supporting parsing, content normalization, validation, and shared internal operations.
Classes

Class

Description

BaseParser

BaseParser

HybridParser

HybridParser

LatticeParser

LatticeParser

OcrStreamParser

OcrStreamParser (header-aware): - Removes horizontal *and* vertical rules before OCR.

StreamParser

StreamParser