When was Aya 23 8B released?

Aya 23 8B was released on 2024-02-21.

What benchmarks has Aya 23 8B been tested on?

Aya 23 8B has been evaluated on 4 benchmarks, including Google-Proof Q&A, HellaSwag, HumanEval, Massive Multitask Language Understanding.

Aya 23 8B

Name: Aya 23 8B
Author: Cohere

Released

2024-02-21

Last refreshed

2026-04-15

Status

Researched 154d ago

CodingClassification

Aya 23 8B has model metadata, but missing tracked provider pricing keeps it from being a default production pick.

Use it for

Teams evaluating coding and classification

Do not use it for

Cost-sensitive launches that need sourced token pricing
Vision or document-understanding workloads
Strict JSON or tool-calling flows

Specifications

Family: Aya
Released: 2024-02-21
Parameters: 8B
Architecture: Decoder Only
Specialization: general
Training: finetuned

Created by

Cohere

Empowering developers with advanced language AI.

Toronto, Ontario, Canada

Founded 2022

Website

Pricing

No tracked provider token pricing is available yet.

About

Aya-23-8B is a multilingual large language model developed by Cohere For AI, featuring 8 billion parameters. As an instruction-fine-tuned model, it is adept at following instructions and is optimized for text generation and understanding. The model employs a decoder-only Transformer architecture, utilizing enhancements like parallel attention and feed-forward layers for efficiency. It supports 23 languages, including Arabic, Chinese, English, and French, and is proficient in tasks such as machine translation, chatbot interactions, and text summarization. Despite its capabilities, its performance might vary across languages, particularly those with less linguistic resources, and it has a context length limit of 8192 tokens. Training involved diverse data sources like human annotations and synthetic datasets to bolster its multilingual proficiency.

Aya 23 8B is a model in the Aya family. Headline tracked benchmarks include Google-Proof Q&A 45.2, HellaSwag 87.3, and HumanEval 68.5.

Top use-case fit: coding, agents, and build tasks

Coding

1 relevant benchmark in the decision map.

Classification

2 relevant benchmarks in the decision map.

Provider price ladder

No tracked provider token pricing is available for this model yet.

Capabilities

No model capability flags are currently sourced.

Benchmark peer barsfor Coding

HumanEvalRank 51 of 86

96.7

94.5

94.2

93.1

68.5

Benchmark scores(4)

Scores are benchmark-specific and are direction-aware: the same numeric gap can mean very different outcomes across suites. Use the leaderboard context and this model's provider route to decide whether the winning margin is meaningful for your workload.

Benchmark	Score	Version	Source
Google-Proof Q&A	45.2	diamond	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
HellaSwag	87.3	10-shot	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
HumanEval	68.5	pass@1	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
Massive Multitask Language Understanding	72.8	5-shot	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Migration checks

No linked migration route is available for this model yet.

Rankings & picks(7)

Best LLMs for Code GenerationListed Best LLMs for ClassificationListed Best Small Language Models (SLMs)Listed Cheapest LLM APIs You Can Call Right NowListed Best Mainstream LLM APIs, RankedListed Best LLMs for WritingListed Best LLMs for MarketingListed