Package com.extractpdf4j.cli
Class Main
java.lang.Object
com.extractpdf4j.cli.Main
- All Implemented Interfaces:
Runnable
Main
CLI entry point for ExtractPDF4J. Parses command-line flags, constructs the appropriate parser (stream / lattice / ocrstream / hybrid), runs extraction, and writes CSV output either to STDOUT or to file(s).
Synopsis
java -jar extractpdf4j-cli-1.0.0.jar <pdf>
[--mode stream|lattice|ocrstream|hybrid]
[--pages 1|all|1,3-5]
[--sep ,]
[--out out.csv]
[--debug]
[--dpi 300]
[--ocr auto|cli|bytedeco]
[--keep-cells]
[--debug-dir <dir>]
[--min-score 0-1]
[--require-headers Date,Description,Balance]
Notes
- When
--outis omitted, tables are printed to STDOUT in CSV form. - When multiple tables are found and
--outis provided, files are numbered by suffix (e.g.,out-1.csv,out-2.csv). --pagesaccepts"1","2-5","1,3-4", or"all".--ocrsets a system property read by the OCR helpers; values: auto, cli, or bytedeco.
Exit behavior: this method returns after printing errors/usage; it does not
call System.exit.
- Since:
- 2025
- Author:
- Mehuli Mukherjee
-
Constructor Summary
Constructors -
Method Summary
-
Constructor Details
-
Main
public Main()
-
-
Method Details
-
main
Program entry point.Parses flags, constructs a
BaseParser(or subclass), runs extraction, then writes or prints CSV results. Errors and invalid flags cause usage to be printed and the method to return.- Parameters:
args- command-line arguments (seeusage()for details)- Throws:
Exception- if an unrecoverable I/O error occurs during parsing/writing
-
run
public void run()
-