# PDF Range OCR Script This project provides a command line script that recognizes text from a selected PDF page range. ## Requirements 1. Linux with Tesseract OCR installed: sudo apt-get update sudo apt-get install -y tesseract-ocr tesseract-ocr-rus tesseract-ocr-eng 2. Python dependencies: uv sync ## Usage Run OCR for an inclusive 1-based page range and write to a text file: uv run python main.py --input "input.pdf" --start 5 --end 12 --output "result.txt" If `--start` and `--end` are both omitted, OCR runs from the first page to the last page. Optional flags: - --lang (default: rus+eng) - --dpi (default: 300) - --rotate (default: 0, degrees before OCR) Example: uv run python main.py \ --input "Красавчикова. Личные права. 1994.pdf" \ --start 1 \ --end 3 \ --output "ocr_output.txt" \ --lang "rus+eng" \ --dpi 300 \ --rotate 90 The output file is UTF-8 text with page separators: === Page 1 === === Page 2 ===