LLM Reference

PaddleOCR VL

paddleocr-vl

Researched today

Last refreshed 2026-05-22. Next refresh: weekly.

Open SourceMultimodalVision

PaddleOCR VL is worth evaluating for vision when its provider route and context window match the workload.

Decision context: Vision task fit, 1 tracked provider route, and research from 2026-05-22.

Use it for

  • Teams evaluating vision
  • Workloads that can use a 16K context window
  • Buyers comparing 1 tracked provider route

Do not use it for

  • Strict JSON or tool-calling flows

Cheapest output

$0.020

Novita AI per 1M tokens

Provider routes

1

Tracked API hosts

Quality / dollar

Unknown

No task benchmark coverage yet

Freshness

2026-05-22

Researched today

fresh

Top use-case fit

Vision

Included by capability and metadata signals in the decision map.

Provider price ladder

ProviderInput / 1MOutput / 1MRoute
Novita AI$0.020$0.020
Serverless

Benchmark peer barsfor Vision

No task-mapped benchmark peers are available for this model yet.

Migration checks

No linked migration route is available for this model yet.

About

PaddleOCR VL is a 0.9B ultra-compact vision-language model from Baidu's PaddlePaddle team for multilingual document parsing. Combines a NaViT-style dynamic resolution visual encoder with ERNIE-4.5-0.3B. Supports 109 languages for recognizing text, tables, formulas, and charts. Achieved 92.56 on OmniDocBench V1.5, surpassing larger models including DeepSeek-OCR. Released October 16, 2025.

PaddleOCR VL has a 16K-token context window.

PaddleOCR VL input tokens at $0.02/1M, output at $0.02/1M.

Capabilities

VisionMultimodal

Rankings

Specifications

Released2025-10-16
Parameters0.9B
Context16K
ArchitectureEncoder-Decoder
Specializationvision
Trainingpretrained

Created by

Innovative text-to-video and app builder

Beijing, China
Founded 2010
Website

Providers(1)