Ritvik Rastogi

Jul 1, 2024

7 stories

1 save

Gemini / Gemma Models

Utilizes interleaving local-global attentions and group-query attention, trained with knowledge distillation instead of next token prediction to achieve competitive performance comparable with larger models.
A more lightweight variant of the Gemini 1.5 pro, designed for efficiency with minimal regression in quality, making it suitable for applications where compute resources are limited.
Based on Griffin, uses a combination of linear recurrences and local attention instead of global attention to model long sequences efficiently.
A family of 2B and 7B, state-of-the-art language models based on Google's Gemini models, offering advancements in language understanding, reasoning, and safety.
A highly compute-efficient multimodal mixture-of-experts model that excels in long-context retrieval tasks and understanding across text, video, and audio modalities.
A family of highly capable multi-modal models, trained jointly across image, audio, video, and text data for the purpose of building a model with strong generalist capabilities across modalities.
Ritvik Rastogi

Ritvik Rastogi

Data Scientist, 2x Kaggle Expert