LLM ReferenceLLM Reference

Chameleon

AI at MetaChameleon Research License
2 models2024Up to 4K ctx

About

The Chameleon family of large language models (LLMs), developed by Meta AI, marks a significant advancement in multimodal AI. These models feature an innovative early-fusion architecture that integrates both images and text seamlessly, allowing simultaneous processing and generation in any sequence. Unlike traditional models that handle modalities separately, Chameleon holistically combines visual and textual data from the beginning, which enhances its understanding of complex inputs. This architecture facilitates tasks like image captioning, visual question answering, and mixed-modal generation. A notable innovation is its vector quantization approach, enabling image tokenization compatible with transformer text processing. The models are accessible in various sizes, including the publicly available 7B and 34B parameter versions, and have outperformed other leading models like Google's Gemini Pro and OpenAI's GPT-4V in specific tasks. Additionally, another version of Chameleon, developed by UCLA and Microsoft, focuses on compositional reasoning with integrated tool usage, achieving state-of-the-art results in benchmarks such as ScienceQA.

Specifications(2 models)

Chameleon model specifications comparison
ModelReleasedContextParameters
Chameleon 34B2024-064K34B
Chameleon 7B2024-064K7B

Frequently Asked Questions

What is Chameleon?
The Chameleon family of large language models (LLMs), developed by Meta AI, marks a significant advancement in multimodal AI. These models feature an innovative early-fusion architecture that integrates both images and text seamlessly, allowing simultaneous processing and generation in any sequence. Unlike traditional models that handle modalities separately, Chameleon holistically combines visual and textual data from the beginning, which enhances its understanding of complex inputs. This architecture facilitates tasks like image captioning, visual question answering, and mixed-modal generation. A notable innovation is its vector quantization approach, enabling image tokenization compatible with transformer text processing. The models are accessible in various sizes, including the publicly available 7B and 34B parameter versions, and have outperformed other leading models like Google's Gemini Pro and OpenAI's GPT-4V in specific tasks. Additionally, another version of Chameleon, developed by UCLA and Microsoft, focuses on compositional reasoning with integrated tool usage, achieving state-of-the-art results in benchmarks such as ScienceQA.
How many models are in the Chameleon family?
The Chameleon family contains 2 models.
What is the latest Chameleon model?
The latest model is Chameleon 34B, released in 2024-06.

Models(2)