LLM ReferenceLLM Reference

Florence 2 Base

About

Florence-2 Base is a compact, open-source vision-language model by Microsoft designed to tackle a diverse range of vision tasks through a unified sequence-to-sequence framework 1210. It seamlessly processes images and text prompts for tasks such as captioning, object detection, segmentation, and visual grounding, all managed through a single set of parameters guided by task-specific prompts 34. With a relatively small size of 0.23 billion parameters, it is optimized for devices with limited computational resources, yet its performance is comparable to larger models, owing to its training on the expansive FLD-5B dataset with 5.4 billion annotations across 126 million images 47.

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsCode Execution

Rankings

Specifications

Released2024-06-10
Parameters230M
ArchitectureDecoder Only
Specializationgeneral
Trainingfinetuning

Created by

Advancing the state-of-the-art in AI and computing.

Redmond, Washington, United States
Founded 1991
Website