LLM Reference

Florence 2 Base

About

Florence-2 Base is a compact, open-source vision-language model by Microsoft designed to tackle a diverse range of vision tasks through a unified sequence-to-sequence framework 1210. It seamlessly processes images and text prompts for tasks such as captioning, object detection, segmentation, and visual grounding, all managed through a single set of parameters guided by task-specific prompts 34. With a relatively small size of 0.23 billion parameters, it is optimized for devices with limited computational resources, yet its performance is comparable to larger models, owing to its training on the expansive FLD-5B dataset with 5.4 billion annotations across 126 million images 47.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

Parameters230M
ArchitectureDecoder Only
Specializationgeneral