Qwen1.5-MoE-A2.7B

About

Qwen1.5-MoE-A2.7B is a transformer-based Mixture of Experts (MoE) language model developed by Alibaba Cloud's Qwen team. It is designed to efficiently perform with a significantly reduced resource footprint by activating only a subset of parameters during inference. Despite using just 2.7 billion activated parameters, it achieves performance comparable to larger 7-billion parameter models while boasting a marked reduction in training costs and increased inference speed. The model is an "upcycled" iteration of the Qwen-1.8B, leveraging pre-trained weights to streamline development. It is available on Hugging Face for various NLP tasks, such as text generation, summarization, and open-ended dialogue.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyQwen1.5

Released2024-03-28

ArchitectureMixture of Experts

Specializationgeneral