reinforcement learning from human feedback

RLHF

See matching models with benchmark scores and pricing.

Definition

RLHF aligns LLMs with human preferences through a multi-stage process: training a reward model on ranked response pairs, then using reinforcement learning to optimize the policy model against this reward. Typically employing PPO, it maximizes expected reward while constraining deviation from a reference model.

Models Mentioning reinforcement learning from human feedback(10)

StarChat2 15B2024-07 Qwen2-0.5B2024-06 Gemma 1.1 7B Instruct2024-02 XuanYuan 13B2024-02 InternLM2 7B2024-01 InternLM2 1.8B2024-01 Phi-22023-12 Hermes 2 Theta Llama 3 70B2023-12 Cerebras GPT 590M2023-03 GPT-3.5 Turbo2023-03