starling-lm

The model harnesses the power of our new GPT-4 labeled ranking dataset, Nectar, and our new reward training and policy tuning pipeline. Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo.

*Based on MT Bench evaluations, using GPT-4 scoring. Further human evaluation is needed.

Authors: Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu and Jiantao Jiao.

For correspondence, please contact Banghua Zhu (banghua@berkeley.edu).

Reference

Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF

HuggingFace