Ovis Image
ovis-image
Open SourceMultimodal
About
Ovis-Image by Alibaba's AIDC-AI team. A 7B text-to-image model specifically optimized for high-quality text rendering, achieving 92%+ word accuracy on CVTG-2K benchmark. Built on a diffusion visual decoder integrated with the Ovis 2.5 multimodal backbone. Generates 1024x1024 images in realistic, cyberpunk, anime, and sci-fi styles. Available on HuggingFace (AIDC-AI/Ovis-Image-7B).
Capabilities
VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode ExecutionPrompt CachingBatch APIAudioFine-tuning
Specifications
FamilyOvis Image
Released2025-11-29
Parameters7B
Architecturediffusion
Specializationimage-generation