LLM Reference

Persimmon 8B

About

Persimmon-8B is a sophisticated open-source large language model developed by Adept AI, featuring approximately 8 billion parameters. It is a decoder-only transformer enhanced with squared ReLU activation functions and rotary positional encodings, offering a substantial context window of 16,000 tokens, more than quadrupling the capacity of models like LLaMA 2 and GPT-3. Trained on a dataset consisting of 737 billion tokens blended with text and code, it employs an advanced version of FlashAttention for efficient handling of long sequences. Despite utilizing less data than LLaMA 2, it achieves comparable performance on various benchmarks. Released under the Apache 2.0 license, Persimmon-8B is poised for potential multimodal applications with its unused embeddings and provides versatile, fast inference capabilities, although it requires further fine-tuning to mitigate bias.

Capabilities

MultimodalFunction CallingTool UseJSON Mode

Specifications

FamilyPersimmon
ArchitectureDecoder Only
Specializationgeneral