LLM ReferenceLLM Reference
NVIDIA NIM

Kosmos 2 on NVIDIA NIM

Kosmos-2 · Microsoft Research

Provisioned

Last refreshed 2026-05-01. Next refresh: weekly.

Why use Kosmos 2 on NVIDIA NIM?

NVIDIA NIM offers Kosmos 2 with competitive pricing. NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices.

Input / 1M
-
Output / 1M
-
Cache
Not sourced
Batch
Not sourced

Setup recipe

Docs fallback
Install
Use the provider REST API or SDK
Auth
Create a provider API key
Call
model: kosmos-2
Model ID
kosmos-2

Request example

Curated snippets for this provider are not sourced yet. Use NVIDIA NIM documentation with model ID kosmos-2.

Gotchas

No curated gotchas have been sourced for this exact provider/model route yet.

Pricing

TypeRate
GPU Hour Rate$1.00/GPU·hr
GPU Config1xH100

Capabilities

No model capability flags are currently sourced.

About Kosmos 2

Kosmos-2, developed by Microsoft Research, is an advanced multimodal large language model (MLLM) that enhances the capabilities of its predecessor, Kosmos-1. It features a Transformer-based architecture trained on the GrIT dataset of grounded image-text pairs, enabling it to understand and interact with both text and visual data. A key innovation is Kosmos-2's ability to ground language to the visual world, allowing for nuanced interaction with images by linking text to specific visual elements using location tokens. This model excels in various tasks including image caption generation, referring expression comprehension, and perception-language tasks, making it valuable for applications such as robotics, multimodal dialogue systems, and more. Kosmos-2 is considered a significant step towards AI systems that are more contextually aware and closer to achieving artificial general intelligence (AGI) 12.

FAQ

Who created Kosmos 2?

Kosmos 2 was created by Microsoft Research as part of the Kosmos-2 model family.

Is Kosmos 2 open source?

Kosmos 2's open source status is unknown in the seed data.

Get Started

Model Specs

Released2023-03-15
Parameters1.66B
ArchitectureDecoder Only