LLM Reference

Chameleon 34B

About

Chameleon 34B is a cutting-edge AI model developed by Meta's FAIR team, renowned for its ability to process and generate both text and images through a unified, token-based architecture. This design allows it to handle multimodal inputs seamlessly, setting it apart from models that treat text and images separately. Trained on 10 trillion tokens, including text, images, and code, Chameleon excels in tasks like visual question answering, image captioning, and text generation. It surpasses models like GPT-4V and Gemini Pro in human evaluations and competes well on text-only benchmarks. Despite its prowess, the full model is not publicly available due to safety concerns, with only a research version supporting mixed-modal inputs accessible. The model does have some limitations, including challenges with text-heavy image processing, which Meta is actively working to improve.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyChameleon
Released2024-06-18
Parameters34B
Context4K
ArchitectureDecoder Only
Specializationgeneral