Kosmos 2 on NVIDIA NIM

Name: Kosmos 2 on NVIDIA NIM
Brand: Microsoft Research
SKU: kosmos-2-nvidia-nim

Kosmos-2 · Microsoft Research

ProvisionedOpen Source

Last refreshed 2026-05-19. Next refresh: weekly.

Why use Kosmos 2 on NVIDIA NIM?

NVIDIA NIM offers Kosmos 2 with competitive pricing. NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices.

Input / 1M

Output / 1M

Cache

Not sourced

Batch

Not sourced

Setup recipe

Docs fallback

Install

Use the provider REST API or SDK

Auth

Create a provider API key

Call

model: kosmos-2

Model ID

kosmos-2

Request example

Curated snippets for this provider are not sourced yet. Use NVIDIA NIM documentation with model ID kosmos-2.

Gotchas

No curated gotchas have been sourced for this exact provider/model route yet.

Pricing

Type	Rate
GPU Hour Rate	$1.00/GPU·hr
GPU Config	1xH100

Capabilities

No model capability flags are currently sourced.

About Kosmos 2

Kosmos-2, developed by Microsoft Research, is an advanced multimodal large language model (MLLM) that enhances the capabilities of its predecessor, Kosmos-1. It features a Transformer-based architecture trained on the GrIT dataset of grounded image-text pairs, enabling it to understand and interact with both text and visual data. A key innovation is Kosmos-2's ability to ground language to the visual world, allowing for nuanced interaction with images by linking text to specific visual elements using location tokens. This model excels in various tasks including image caption generation, referring expression comprehension, and perception-language tasks, making it valuable for applications such as robotics, multimodal dialogue systems, and more. Kosmos-2 is considered a significant step towards AI systems that are more contextually aware and closer to achieving artificial general intelligence (AGI) 12.