Last refreshed 2026-05-01. Next refresh: weekly.
Why use Kosmos 2 on NVIDIA NIM?
NVIDIA NIM offers Kosmos 2 with competitive pricing. NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices.
Setup recipe
Docs fallbackUse the provider REST API or SDKCreate a provider API keymodel: kosmos-2kosmos-2Request example
kosmos-2.Gotchas
No curated gotchas have been sourced for this exact provider/model route yet.
Pricing
| Type | Rate |
|---|---|
| GPU Hour Rate | $1.00/GPU·hr |
| GPU Config | 1xH100 |
Capabilities
No model capability flags are currently sourced.
About Kosmos 2
Kosmos-2, developed by Microsoft Research, is an advanced multimodal large language model (MLLM) that enhances the capabilities of its predecessor, Kosmos-1. It features a Transformer-based architecture trained on the GrIT dataset of grounded image-text pairs, enabling it to understand and interact with both text and visual data. A key innovation is Kosmos-2's ability to ground language to the visual world, allowing for nuanced interaction with images by linking text to specific visual elements using location tokens. This model excels in various tasks including image caption generation, referring expression comprehension, and perception-language tasks, making it valuable for applications such as robotics, multimodal dialogue systems, and more. Kosmos-2 is considered a significant step towards AI systems that are more contextually aware and closer to achieving artificial general intelligence (AGI) 12.
FAQ
Who created Kosmos 2?
Kosmos 2 was created by Microsoft Research as part of the Kosmos-2 model family.
Is Kosmos 2 open source?
Kosmos 2's open source status is unknown in the seed data.