Package com.extractpdf4j.annotations
Annotation Interface ExtractPdfConfig
Annotation-based configuration for ExtractPDF4J parsers.
Apply this annotation to a class to declare parser settings that can be
materialized via ExtractPdfAnnotations.
-
Optional Element Summary
Optional ElementsModifier and TypeOptional ElementDescriptionbooleanEnables debug artifact output for lattice/ocr/hybrid.Directory where debug artifacts should be written.floatDPI for image-based parsing (lattice/ocr/hybrid).booleanWhether to keep empty cells in lattice parsing.doubleMinimum average score for hybrid parser selection.Page selection string (e.g., "all", "1", "2-5", "1,3-4").Parser strategy to use when materializing a parser.String[]Required OCR headers to look for before returning results.booleanWhether to strip/normalize text for stream-based extraction.
-
Element Details
-
parser
ParserMode parserParser strategy to use when materializing a parser.- Default:
- HYBRID
-
pages
String pagesPage selection string (e.g., "all", "1", "2-5", "1,3-4").- Default:
- "1"
-
stripText
boolean stripTextWhether to strip/normalize text for stream-based extraction.- Default:
- true
-
dpi
float dpiDPI for image-based parsing (lattice/ocr/hybrid).- Default:
- 450.0f
-
debug
boolean debugEnables debug artifact output for lattice/ocr/hybrid.- Default:
- false
-
keepCells
boolean keepCellsWhether to keep empty cells in lattice parsing.- Default:
- false
-
minScore
double minScoreMinimum average score for hybrid parser selection.- Default:
- 0.0
-
debugDir
String debugDirDirectory where debug artifacts should be written.- Default:
- ""
-
requiredHeaders
String[] requiredHeadersRequired OCR headers to look for before returning results.- Default:
- {}
-