Ritvik Rastogi

Dec 24, 2024

5 stories

1 save

A 14B language model prioritizing data quality through a training process incorporating synthetic data for pretraining and midtraining, curated organic data seeds, and innovative post-training techniques like pivotal token search for DPO, resulting in strong performance on reasoning-focused benchmarks, especially in STEM, comparable to much larger models, while also addressing overfitting and data contamination concerns.
A family of models consisting of three variants - MoE (16x3.8B), mini (3.8B), and vision (4.2B) - which are lightweight, multilingual, and trained on synthetic and filtered publicly available documents - with a focus on very high-quality, reasoning dense data.
A series of language models trained on heavily filtered web and synthetic data set, achieving performance comparable to much larger models like Mixtral 8x7B and GPT-3.5.
An LLM for code, trained using a textbook quality data from the web and synthetically generated textbooks and exercises with GPT-3.5.
Ritvik Rastogi

Ritvik Rastogi

Data Scientist, 2x Kaggle Expert