LLM ReferenceLLM Reference

Ovis Image

ovis-image

Open SourceMultimodal

About

Ovis-Image by Alibaba's AIDC-AI team. A 7B text-to-image model specifically optimized for high-quality text rendering, achieving 92%+ word accuracy on CVTG-2K benchmark. Built on a diffusion visual decoder integrated with the Ovis 2.5 multimodal backbone. Generates 1024x1024 images in realistic, cyberpunk, anime, and sci-fi styles. Available on HuggingFace (AIDC-AI/Ovis-Image-7B).

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning

Rankings

Specifications

Released2025-11-29
Parameters7B
Architecturediffusion
Specializationimage-generation

Created by

AI research institute of Alibaba Group.

Hangzhou, Zhejiang, China
Founded 2017
Website