Page Ranges¶
ExtractPDF4J supports flexible page selection so you can target only the pages that matter.
This is useful when:
- only part of the PDF contains tables
- cover pages should be skipped
- appendices add noise
- you want faster, more focused extraction
Supported formats¶
Single page¶
Extracts only page 1.
Page range¶
Extracts pages 1 through 3.
Mixed selection¶
Extracts page 1, then pages 3 through 5.
All pages¶
Extracts all pages in the document.
Example: Java API¶
import com.extractpdf4j.helpers.Table;
import com.extractpdf4j.parsers.HybridParser;
import java.util.List;
public class PageRangesExample {
public static void main(String[] args) throws Exception {
List<Table> tables = new HybridParser("statement.pdf")
.pages("2-4")
.dpi(300f)
.parse();
System.out.println("Tables found: " + tables.size());
}
}
Example: CLI¶
java -jar extractpdf4j-parser-<version>.jar statement.pdf \
--mode hybrid \
--pages 2-4 \
--out result.csv
When to use page ranges¶
Use page ranges when:
- the first page is just a cover or summary
- tables start from page 2 onward
- only selected sections contain tabular data
- you want to reduce OCR cost on long scans
Common patterns¶
Skip the first page¶
If your implementation does not support 2-all, use explicit ranges instead, such as:
Extract two separate regions of the document¶
Useful when the middle pages are irrelevant.
Best practices¶
- Narrow page ranges before enabling heavy OCR
- Keep extraction focused on known table-bearing pages
- Use page targeting before adding complex tuning
- Validate page assumptions when document templates change
Common mistakes¶
Using invalid syntax¶
Incorrect:
Correct:
Forgetting commas in mixed selections¶
Incorrect:
Correct:
Over-processing every page¶
Using all on large scanned PDFs can be slower than needed. Prefer specific ranges when you know where tables are.