OLMo 1B
About
AMD OLMo 1B is a fully open-source large language model with 1 billion parameters, designed for advanced reasoning, instruction-following, and chat capabilities. Utilizing a decoder-only transformer architecture, it is trained on a 1.3 trillion-token subset of the Dolma v1.7 dataset, achieving a remarkable training throughput of 12,200 tokens per second per GPU. AMD leveraged its Instinct MI250 GPUs across 16 nodes to optimize performance, followed by a sophisticated three-stage training process of pre-training, supervised fine-tuning, and direct preference optimization. The model is open for public access, including its code, weights, and training methodologies, contributing significantly to further AI research and development while also being capable of running inferences on AMD Ryzen AI PCs with NPUs. It stands out for superior performance compared to other similarly sized open-source models in various benchmarks.