LLM Reference

Qwen-72B

About

The Qwen-72B is a potent large language model developed by Alibaba Cloud, featuring 72 billion parameters and built on the Transformer architecture. This model introduces enhancements such as SwiGLU activation, attention QKV bias, and group query attention for efficient performance on complex language tasks. Trained on a dataset of over 3 trillion tokens, it excels in multiple domains, including language understanding and generation, code generation, and translation, across multiple languages like Chinese and English. With an impressive context length of up to 32,000 tokens and a vocabulary exceeding 150,000 tokens, it manages extensive text inputs well. However, it bears limitations like inherited training data biases and lack of common sense reasoning. Despite these constraints, it remains a strong performer across various benchmarks, with a subsequent version, Qwen2-72B, offering further advancements.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Providers(1)

ProviderInput (per 1M)Output (per 1M)Type
Fireworks AI Platform
Provisioned

Specifications

FamilyQwen
Released2023-11-30
ArchitectureDecoder Only
Specializationgeneral