LLM Reference
NVIDIA NIM

Fuyu-8B on NVIDIA NIM

Fuyu · Adept AI

Provisioned

Pricing

TypePrice (per 1M)
Input tokensFree
Output tokensFree

Capabilities

VisionMultimodalReasoningFunction CallingTool UseJSON ModeCode Execution

About Fuyu-8B

Fuyu-8B, developed by Adept AI, is a sophisticated multimodal large language model that excels in both text and image processing. It employs a streamlined decoder-only transformer architecture, allowing it to integrate image patches directly into its layers, effectively handling images of any resolution without complex training stages. Notably, Fuyu-8B can tackle a wide array of tasks, from visual question answering and image captioning to document understanding and optical character recognition. Despite its capabilities, it has certain limitations, such as challenges with generating faces and potential biases. The model's design prioritizes speed and real-time application suitability, with some versions available as open-source under specific licenses 12.

Get Started

Model Specs

Released2024-01-24
Parameters8B
ArchitectureDecoder Only