
Florence 2
About
The Florence-2 family, created by Microsoft, features advanced large language models designed specifically for vision and vision-language tasks. These models are known for their ability to effectively address a variety of assignments, such as captioning, object detection, and segmentation, by employing a prompt-based methodology 2. Their unified representation is a significant advantage, allowing seamless task execution within a single model framework 3. Leveraging the extensive FLD-5B dataset, which offers 5.4 billion annotations across 126 million images, these models excel in multitask learning 2. The Florence-2 suite includes the Florence-2-base and Florence-2-large models, featuring parameter counts of 0.23 billion and 0.77 billion, respectively. Additionally, fine-tuned iterations like Florence-2-base-ft and Florence-2-large-ft demonstrate enhanced performance across various downstream tasks, while their compact size ensures they are efficient and suitable for resource-limited environments 3.