Jul 31, 2024
2 stories
4 saves
7B & 8x7B evaluation LLMs that score high correlations with both human evaluators and proprietary LM-based judges on both direct assessment and pairwise ranking, obtained by merging Mistral models trained on Feedback Collection and Preference Collection (curated in this work.
A 13B fully open source evaluation LLM trained on Feedback Collection curated using GPT-4 (in this work).