Stop Blindly
Using OCR
PreOCR is an open-source Python OCR and document classification library that decides when you actually need vision OCR. Save up to 90% in GPU/CPU cycles by extracting native text in 0ms and only running heavy OCR when required.
- • Cut OCR costs by up to 90% by avoiding unnecessary vision runs
- • Keep text fidelity at 100% by using native PDF text whenever possible
- • Ship fast with a Python SDK and simple HTTP API for your backend
Fast_Stream
VLM Fallback
Document AI Routing Engine
Optimizing OCR & Native Text Extraction for RAG Pipelines
Fast-Track
Direct extraction of native vector layers. Zero vision compute required. Ideal for digital-first documents and system-generated PDFs.
Cognitive Vision
Advanced fallback for scanned images, poor quality photos, or handwriting. Triggers VLM analysis for complex layout recovery.