List: LLM Evaluation | Curated by Ritvik Rastogi

Jul 31, 2024

2 stories

4 saves

LLM Evaluation

7B & 8x7B evaluation LLMs that score high correlations with both human evaluators and proprietary LM-based judges on both direct assessment and pairwise ranking, obtained by merging Mistral models trained on Feedback Collection and Preference Collection (curated in this work.

Ritvik Rastogi

Papers Explained 171: Prometheus 2

This Work curates Preference Collection, a fine-grained pairwise ranking feedback dataset that builds on the Feedback Collection.

Jul 31, 2024

Jul 31, 2024

A 13B fully open source evaluation LLM trained on Feedback Collection curated using GPT-4 (in this work).

Ritvik Rastogi

Papers Explained 170: Prometheus

A 13B fully open source evaluation LLM trained on Feedback Collection curated using GPT-4 (in this work).

Jul 29, 2024

Jul 29, 2024