Chameleon 7B
About
Chameleon 7B is an innovative mixed-modal foundation model from Meta AI's FAIR team, notable for its early-fusion architecture that treats images and text as a single sequence of tokens. This approach allows seamless integration of multimodal inputs, enhancing the coherence and contextual relevance of outputs. Utilizing a transformer architecture and incorporating advancements like query-key normalization, Chameleon 7B maintains stable training at scale. It excels in tasks like visual question answering, image captioning, and text generation, often outperforming similar-sized models. Originally capable of generating images, its public version is restricted to text-only outputs for safety considerations, making it a robust tool for research in multimodal AI applications.