LLM Reference

Florence 2 Large

About

Florence-2 Large is a versatile vision-language foundation model by Microsoft Azure AI, open-sourced under the MIT license. This model excels in handling a variety of vision and vision-language tasks through a unified, prompt-based approach. Unlike single-task focused models, Florence-2 adeptly manages multiple tasks such as image captioning, object detection, visual grounding, and segmentation. Despite its compact size of 0.77 billion parameters, it rivals much larger models in performance due to its training on the extensive FLD-5B dataset. Its architecture features a sequence-to-sequence framework with a DaViT image encoder and a multi-modality encoder-decoder. Florence-2's ability to process diverse tasks with a singular architecture, paired with strong zero-shot and fine-tuning performance, makes it suitable for deployment on resource-constrained devices.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

Parameters770M
ArchitectureDecoder Only
Specializationgeneral