ExtractPdfConfig (ExtractPDF4J 2.1.0 API)

@Retention(RUNTIME) @Target(TYPE) public @interface ExtractPdfConfig

Annotation-based configuration for ExtractPDF4J parsers.

Apply this annotation to a class to declare parser settings that can be materialized via ExtractPdfAnnotations.

Optional Element Summary

Optional Elements

Modifier and Type

Optional Element

Description

boolean

debug

Enables debug artifact output for lattice/ocr/hybrid.

String

debugDir

Directory where debug artifacts should be written.

float

dpi

DPI for image-based parsing (lattice/ocr/hybrid).

boolean

keepCells

Whether to keep empty cells in lattice parsing.

double

minScore

Minimum average score for hybrid parser selection.

String

pages

Page selection string (e.g., "all", "1", "2-5", "1,3-4").

ParserMode

parser

Parser strategy to use when materializing a parser.

String[]

requiredHeaders

Required OCR headers to look for before returning results.

boolean

stripText

Whether to strip/normalize text for stream-based extraction.

Element Details
- parser
  
  ParserMode parser
  
  Parser strategy to use when materializing a parser.
  
  Default:
  
  HYBRID
- pages
  
  String pages
  
  Page selection string (e.g., "all", "1", "2-5", "1,3-4").
  
  Default:
  
  "1"
- stripText
  
  boolean stripText
  
  Whether to strip/normalize text for stream-based extraction.
  
  Default:
  
  true
- dpi
  
  float dpi
  
  DPI for image-based parsing (lattice/ocr/hybrid).
  
  Default:
  
  450.0f
- debug
  
  boolean debug
  
  Enables debug artifact output for lattice/ocr/hybrid.
  
  Default:
  
  false
- keepCells
  
  boolean keepCells
  
  Whether to keep empty cells in lattice parsing.
  
  Default:
  
  false
- minScore
  
  double minScore
  
  Minimum average score for hybrid parser selection.
  
  Default:
  
  0.0
- debugDir
  
  String debugDir
  
  Directory where debug artifacts should be written.
  
  Default:
  
  ""
- requiredHeaders
  
  String[] requiredHeaders
  
  Required OCR headers to look for before returning results.
  
  Default:
  
  {}