LLM Reference

About

Kosmos-2.5, developed by Microsoft Document AI, is a sophisticated multimodal literate model particularly well-suited for text-intensive image understanding 1 2. It excels in generating spatially-aware text blocks and preserves styles and structures through structured text output in markdown format 3. This is achieved using a shared decoder-only auto-regressive Transformer architecture with task-specific prompts 1. Pre-trained on a large-scale dataset, Kosmos-2.5 also boasts a fine-tuned counterpart, Kosmos-2.5-chat, which can adeptly answer questions related to text-heavy images 11. Notably compact with 1.37 billion parameters, the model offers impressive capabilities 12, while carrying inherent risks of hallucinations typical of generative models 2.

Models(1)

Details

Models1