Skip to content

Page Ranges

ExtractPDF4J supports flexible page selection so you can target only the pages that matter.

This is useful when:

  • only part of the PDF contains tables
  • cover pages should be skipped
  • appendices add noise
  • you want faster, more focused extraction

Supported formats

Single page

.pages("1")

Extracts only page 1.

Page range

.pages("1-3")

Extracts pages 1 through 3.

Mixed selection

.pages("1,3-5")

Extracts page 1, then pages 3 through 5.

All pages

.pages("all")

Extracts all pages in the document.

Example: Java API

import com.extractpdf4j.helpers.Table;
import com.extractpdf4j.parsers.HybridParser;

import java.util.List;

public class PageRangesExample {
    public static void main(String[] args) throws Exception {
        List<Table> tables = new HybridParser("statement.pdf")
                .pages("2-4")
                .dpi(300f)
                .parse();

        System.out.println("Tables found: " + tables.size());
    }
}

Example: CLI

java -jar extractpdf4j-parser-<version>.jar statement.pdf \
  --mode hybrid \
  --pages 2-4 \
  --out result.csv

When to use page ranges

Use page ranges when:

  • the first page is just a cover or summary
  • tables start from page 2 onward
  • only selected sections contain tabular data
  • you want to reduce OCR cost on long scans

Common patterns

Skip the first page

.pages("2-all")

If your implementation does not support 2-all, use explicit ranges instead, such as:

.pages("2-10")

Extract two separate regions of the document

.pages("1-2,5-6")

Useful when the middle pages are irrelevant.

Best practices

  • Narrow page ranges before enabling heavy OCR
  • Keep extraction focused on known table-bearing pages
  • Use page targeting before adding complex tuning
  • Validate page assumptions when document templates change

Common mistakes

Using invalid syntax

Incorrect:

.pages("page 1")

Correct:

.pages("1")

Forgetting commas in mixed selections

Incorrect:

.pages("1 3-5")

Correct:

.pages("1,3-5")

Over-processing every page

Using all on large scanned PDFs can be slower than needed. Prefer specific ranges when you know where tables are.