Class Main

java.lang.Object
com.extractpdf4j.cli.Main
All Implemented Interfaces:
Runnable

public class Main extends Object implements Runnable
Main

CLI entry point for ExtractPDF4J. Parses command-line flags, constructs the appropriate parser (stream / lattice / ocrstream / hybrid), runs extraction, and writes CSV output either to STDOUT or to file(s).

Synopsis


 java -jar extractpdf4j-cli-1.0.0.jar <pdf>
      [--mode stream|lattice|ocrstream|hybrid]
      [--pages 1|all|1,3-5]
      [--sep ,]
      [--out out.csv]
      [--debug]
      [--dpi 300]
      [--ocr auto|cli|bytedeco]
      [--keep-cells]
      [--debug-dir <dir>]
      [--min-score 0-1]
      [--require-headers Date,Description,Balance]
 

Notes

  • When --out is omitted, tables are printed to STDOUT in CSV form.
  • When multiple tables are found and --out is provided, files are numbered by suffix (e.g., out-1.csv, out-2.csv).
  • --pages accepts "1", "2-5", "1,3-4", or "all".
  • --ocr sets a system property read by the OCR helpers; values: auto, cli, or bytedeco.

Exit behavior: this method returns after printing errors/usage; it does not call System.exit.

Since:
2025
Author:
Mehuli Mukherjee
  • Constructor Details

    • Main

      public Main()
  • Method Details

    • main

      public static void main(String[] args) throws Exception
      Program entry point.

      Parses flags, constructs a BaseParser (or subclass), runs extraction, then writes or prints CSV results. Errors and invalid flags cause usage to be printed and the method to return.

      Parameters:
      args - command-line arguments (see usage() for details)
      Throws:
      Exception - if an unrecoverable I/O error occurs during parsing/writing
    • run

      public void run()
      Specified by:
      run in interface Runnable