LLM Reference

PaddleOCR VL Models by Baidu AI

1 model2025Up to 16K ctxFrom $0.02/1M input

About

PaddleOCR VL is Baidu's ultra-compact vision-language model family for multilingual document parsing and OCR. The flagship 0.9B model combines a NaViT-style dynamic resolution visual encoder with ERNIE-4.5-0.3B and supports 109 languages. Achieved SOTA on OmniDocBench V1.5 at launch.

Current Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

1 in view

Use when the workload needs vision, 16K context, and 900M parameters.

2025-10vision16K context900M parameters

Release Timeline

1 release group
2025-10
1 current
PaddleOCR VL
vision16K context900M parameters
Current

Specifications(1 models)

PaddleOCR VL model specifications comparison
ModelReleasedContextParametersVisionMultimodal
PaddleOCR VL2025-1016K0.9BYesYes

Available From(1 provider)

Pricing

PaddleOCR VL model pricing by provider
ModelProviderInput / 1MOutput / 1MType
PaddleOCR VLNovita AI$0.02$0.02Serverless

Frequently Asked Questions

What is PaddleOCR VL used for?
PaddleOCR VL is used for vision, vision and multimodal work, and coding. The family description and listed model capabilities point to those workloads as the best fit.
How does PaddleOCR VL compare to ERNIE 4.5?
PaddleOCR VL by Baidu AI is strongest where you need vision, while ERNIE 4.5 by Baidu AI is the closest related family to check for vision and multimodal work. PaddleOCR VL has 1 listed variant and reaches up to 16K context, while ERNIE 4.5 reaches up to 128K context, so compare the specs and pricing tables before choosing a production model.
Which PaddleOCR VL model should I use?
For the lowest listed input price, start with PaddleOCR VL through Novita AI at $0.02/1M input tokens. For the most capable/latest local choice, evaluate PaddleOCR VL with 16K context and multimodal inputs.

Models(1)