LLM Reference

Fuyu-8B

About

Fuyu-8B, developed by Adept AI, is a sophisticated multimodal large language model that excels in both text and image processing. It employs a streamlined decoder-only transformer architecture, allowing it to integrate image patches directly into its layers, effectively handling images of any resolution without complex training stages. Notably, Fuyu-8B can tackle a wide array of tasks, from visual question answering and image captioning to document understanding and optical character recognition. Despite its capabilities, it has certain limitations, such as challenges with generating faces and potential biases. The model's design prioritizes speed and real-time application suitability, with some versions available as open-source under specific licenses 12.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
NVIDIA NIM
Provisioned

Specifications

FamilyFuyu
ArchitectureDecoder Only
Specializationgeneral