LLM Reference

Kosmos-2 Models by Microsoft Research

1 model2023Up to 2k ctx

About

The Kosmos-2 family of large language models (LLMs) is a significant advancement in the realm of multimodal AI, particularly noted for its ability to ground language understanding in real-world contexts. These models effectively integrate visual and textual information, excelling in tasks that involve perceiving object descriptions, such as identifying bounding boxes, and aligning textual data with visual content. Utilizing a Transformer-based architecture, Kosmos-2 is trained on a substantial dataset of grounded image-text pairs, enabling it to perform a range of tasks, including multimodal grounding, referring expression comprehension, and general language understanding and generation. Noteworthy is its innovative approach of representing referential expressions as Markdown links, which enhances the precision of visual-textual alignment. This positions the Kosmos-2 family as a vital bridge between language and multimodal perception, with its models like kosmos-2-patch14-224 available on Hugging Face, facilitating developments in areas such as image captioning and visual question answering. The overarching goal of Kosmos-2 is to advance the field of artificial general intelligence by contributing to the development of Embodiment AI.

Archived Variants

Use-when guidance is derived from seed capabilities, context, release, and replacement fields.

1 in view1 retired
Kosmos 2Archived

Keep only for existing workloads; choose a current variant for new builds.

2023-032k context1.7B parameters

Release Timeline

1 release group
2023-03
1 retired
Kosmos 2
2k context1.7B parameters
Archived

Specifications(1 models)

Kosmos-2 model specifications comparison
ModelReleasedContextParameters
Kosmos 22023-032k1.66B

Available From(1 provider)

Frequently Asked Questions

What is Kosmos-2 used for?
The Kosmos-2 family of large language models (LLMs) is a significant advancement in the realm of multimodal AI, particularly noted for its ability to ground language understanding in real-world contexts.
How does Kosmos-2 compare to Harrier?
Kosmos-2 by Microsoft Research is strongest where you need its listed use cases, while Harrier by Microsoft Research is the closest related family to check for embedding. Kosmos-2 has 1 listed variant and reaches up to 2k context, while Harrier reaches up to 33k context, so compare the specs and pricing tables before choosing a production model.
Which Kosmos-2 model should I use?
If price is the main constraint, use the pricing table first because Kosmos-2 does not have complete provider pricing in the local data. For the most capable/latest local choice, evaluate Kosmos 2 with 2k context.

Models(1)