Open-Source Python OCR Library

Stop Blindly
Using OCR

PreOCR is an open-source Python OCR and document classification library that decides when you actually need vision OCR. Save up to 90% in GPU/CPU cycles by extracting native text in 0ms and only running heavy OCR when required.

  • • Cut OCR costs by up to 90% by avoiding unnecessary vision runs
  • • Keep text fidelity at 100% by using native PDF text whenever possible
  • • Ship fast with a Python SDK and simple HTTP API for your backend
pip install preocr
CORE_LIBRARY_FETCHING...
0.2ms
multi_page_doc.pdf
Library_01NATIVE

Fast_Stream

Library_02VISION

VLM Fallback

Analysis Mode: Autonomous_Router_v5.2

Document AI Routing Engine

Optimizing OCR & Native Text Extraction for RAG Pipelines

Effort: Minimal

Fast-Track

Direct extraction of native vector layers. Zero vision compute required. Ideal for digital-first documents and system-generated PDFs.

Compute Effort2%
Router_v5
Effort: Full Cognitive

Cognitive Vision

Advanced fallback for scanned images, poor quality photos, or handwriting. Triggers VLM analysis for complex layout recovery.

Compute Effort94%
Decision Latency
0.02ms
PER PAGE ANALYSIS
CPU Effort Saved
84% Avg
VS TRADITIONAL OCR
RAG Accuracy
99.9%
NATIVE FIDELITY