LLM Reference

MT0 Large

About

The MT0 Large language model is a versatile text-to-text transformer developed by the BigScience workshop as part of the BLOOMZ and mT0 family. It excels in following human instructions across multiple languages without explicit training for those specific tasks, thanks to its zero-shot cross-lingual generalization abilities. Although tailored for English prompts, it also efficiently handles other languages. Built on the mt5-large architecture with 1.2 billion parameters, the model was fine-tuned on the BigScience xP3 dataset, enhancing its cross-lingual capabilities for various tasks, including translation, summarization, question answering, and open-ended text generation. Nevertheless, its success depends considerably on effective prompt engineering, and its performance may fluctuate with languages or tasks underrepresented in the xP3 dataset.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyMT0
Parameters1.2B
ArchitectureDecoder Only
Specializationgeneral