List: Gemini / Gemma Models | Curated by Ritvik Rastogi

Dec 9, 2024

12 stories

4 saves

Gemini / Gemma Models
Improves upon its predecessor by using more advanced language models from the Gemma 2 family while retaining the original vision encoder, leading to better performance across various tasks and exploring new tasks.
Ritvik Rastogi
Paper Explained 268: PaliGemma2PaliGemma 2 is an upgrade of PaliGemma by replacing its language model component with the more recent and more capable language models from…
Dec 9, 2024
22
Dec 9, 2024
22
A set of models that aims to reduce hallucinations in LLMs by grounding them in the factual data of Google's Data Commons, allowing users to ask questions in natural language and receive responses based on verified information from trusted sources.
Ritvik Rastogi
Papers Explained 212: DataGemmaThis work presents an approach for enhancing the accuracy of LLMs by integrating them with Data Commons, a vast, open-source repository of…
Sep 17, 2024
160
2
Sep 17, 2024
160
2
A comprehensive suite of LLM-based safety content moderation models ranging from 2B to 27B parameters built upon Gemma2 that provide predictions of safety risks across key harm types (sexually explicit, dangerous content, harassment, hate speech) in both user input and LLM-generated output.
Ritvik Rastogi
Papers Explained 243: ShieldGemmaShieldGemma is a comprehensive suite of LLM-based safety content moderation models ranging from 2B to 27B built upon Gemma2. These models…
Oct 31, 2024
20
Oct 31, 2024
20
Proposes a scalable, yet accurate, proposition segmentation model by modeling Proposition segmentation as a supervised task by training LLMs on existing annotated datasets.
Ritvik Rastogi
Papers Explained 244: Gemma APSThis work focuses on the task of abstractive proposition segmentation: transforming text into simple, self-contained, well-formed…
Nov 1, 2024
20
Nov 1, 2024
20
Combines SigLIP vision model and the Gemma language model and follows the PaLI-3 training recipe to achieve strong performance on various vision-language tasks.
Ritvik Rastogi
Papers Explained 197: Pali GemmaPaliGemma is an open model that continues the line of PaLI vision-language models by combining the SigLIP-So400m vision encoder with the…
Aug 29, 2024
30
Aug 29, 2024
30
Utilizes interleaving local-global attentions and group-query attention, trained with knowledge distillation instead of next token prediction to achieve competitive performance comparable with larger models.
Ritvik Rastogi
Papers Explained 157: Gemma 2Gemma 2 is a new addition to the Gemma family with several technical modifications, including interleaving local-global attentions and…
Jul 1, 2024
29
Jul 1, 2024
29
A more lightweight variant of the Gemini 1.5 pro, designed for efficiency with minimal regression in quality, making it suitable for applications where compute resources are limited.
Ritvik Rastogi
Papers Explained 142: Gemini 1.5 FlashThe tech report introduces two new models: Gemini 1.5 Pro and Gemini 1.5 Flash.
May 27, 2024
60
May 27, 2024
60
Based on Griffin, uses a combination of linear recurrences and local attention instead of global attention to model long sequences efficiently.
Ritvik Rastogi
Papers Explained 132: RecurrentGemmaRecurrentGemma-2B is an open model based on the Griffin architecture. It uses a combination of linear recurrences and local attention…
May 3, 2024
20
May 3, 2024
20
Open code models based on Gemma models by further training on over 500 billion tokens of primarily code
Ritvik Rastogi
Papers Explained 124: CodeGemmaCodeGemma is a collection of open code models built on top of Gemma by further training on more than 500 billion tokens of code, capable of…
Apr 15, 2024
20
Apr 15, 2024
20
A family of 2B and 7B, state-of-the-art language models based on Google's Gemini models, offering advancements in language understanding, reasoning, and safety.
Ritvik Rastogi
Papers Explained 106: GemmaGemma are a family of lightweight (2B and 7B), state-of-the art open language models built from the research and technology used to create…
Feb 28, 2024
36
Feb 28, 2024
36
A highly compute-efficient multimodal mixture-of-experts model that excels in long-context retrieval tasks and understanding across text, video, and audio modalities.
Ritvik Rastogi
Papers Explained 105: Gemini 1.5 ProGemini 1.5 Pro marks a significant milestone in the evolution of multi-modal mixture-of-experts models, pushing the boundaries of compute…
Feb 26, 2024
123
Feb 26, 2024
123
A family of highly capable multi-modal models, trained jointly across image, audio, video, and text data for the purpose of building a model with strong generalist capabilities across modalities.
Ritvik Rastogi
Papers Explained 80: Gemini 1.0Gemini is a family of highly capable multi-modal models developed at Google, trained jointly across image, audio, video, and text data for…
Dec 18, 2023
45
Dec 18, 2023
45

Responses

Gemini / Gemma Models

Paper Explained 268: PaliGemma2

PaliGemma 2 is an upgrade of PaliGemma by replacing its language model component with the more recent and more capable language models from…

Papers Explained 212: DataGemma

This work presents an approach for enhancing the accuracy of LLMs by integrating them with Data Commons, a vast, open-source repository of…

Papers Explained 243: ShieldGemma

ShieldGemma is a comprehensive suite of LLM-based safety content moderation models ranging from 2B to 27B built upon Gemma2. These models…

Papers Explained 244: Gemma APS

This work focuses on the task of abstractive proposition segmentation: transforming text into simple, self-contained, well-formed…

Papers Explained 197: Pali Gemma

PaliGemma is an open model that continues the line of PaLI vision-language models by combining the SigLIP-So400m vision encoder with the…

Papers Explained 157: Gemma 2

Gemma 2 is a new addition to the Gemma family with several technical modifications, including interleaving local-global attentions and…

Papers Explained 142: Gemini 1.5 Flash

The tech report introduces two new models: Gemini 1.5 Pro and Gemini 1.5 Flash.

Papers Explained 132: RecurrentGemma

RecurrentGemma-2B is an open model based on the Griffin architecture. It uses a combination of linear recurrences and local attention…

Papers Explained 124: CodeGemma

CodeGemma is a collection of open code models built on top of Gemma by further training on more than 500 billion tokens of code, capable of…

Papers Explained 106: Gemma

Gemma are a family of lightweight (2B and 7B), state-of-the art open language models built from the research and technology used to create…

Papers Explained 105: Gemini 1.5 Pro

Gemini 1.5 Pro marks a significant milestone in the evolution of multi-modal mixture-of-experts models, pushing the boundaries of compute…

Papers Explained 80: Gemini 1.0

Gemini is a family of highly capable multi-modal models developed at Google, trained jointly across image, audio, video, and text data for…

Ritvik Rastogi

Phi Series

Small LLMs

LLM Training