PaddleOCR VL
paddleocr-vl
Last refreshed 2026-05-22. Next refresh: weekly.
PaddleOCR VL is worth evaluating for vision when its provider route and context window match the workload.
Decision context: Vision task fit, 1 tracked provider route, and research from 2026-05-22.
Use it for
- Teams evaluating vision
- Workloads that can use a 16K context window
- Buyers comparing 1 tracked provider route
Do not use it for
- Strict JSON or tool-calling flows
Cheapest output
$0.020
Novita AI per 1M tokens
Provider routes
1
Tracked API hosts
Quality / dollar
Unknown
No task benchmark coverage yet
Freshness
2026-05-22
Researched today
Top use-case fit
Vision
Included by capability and metadata signals in the decision map.
Provider price ladder
| Provider | Input / 1M | Output / 1M | Route |
|---|---|---|---|
| Novita AI | $0.020 | $0.020 | Serverless |
Benchmark peer barsfor Vision
No task-mapped benchmark peers are available for this model yet.
Migration checks
No linked migration route is available for this model yet.
About
PaddleOCR VL is a 0.9B ultra-compact vision-language model from Baidu's PaddlePaddle team for multilingual document parsing. Combines a NaViT-style dynamic resolution visual encoder with ERNIE-4.5-0.3B. Supports 109 languages for recognizing text, tables, formulas, and charts. Achieved 92.56 on OmniDocBench V1.5, surpassing larger models including DeepSeek-OCR. Released October 16, 2025.
PaddleOCR VL has a 16K-token context window.
PaddleOCR VL input tokens at $0.02/1M, output at $0.02/1M.