Fuyu-Heavy
About
Adept's Fuyu-Heavy is a leading multimodal AI model designed as a digital agent. It ranks as the world's third most capable multimodal model, following GPT-4V and Gemini Ultra. Despite being smaller by a factor of 10-20 compared to these models, Fuyu-Heavy excels in multimodal reasoning, particularly in understanding user interfaces (UIs), and outperforms Gemini Pro on the MMMU benchmark. It delivers robust performance on text-based benchmarks, matching or surpassing models within its compute class. Developed to handle both text and image data, it powers Adept's enterprise product and supports complex calculations and long-form conversations.
Capabilities
MultimodalFunction CallingTool UseJSON Mode