List: Beyond Transformers | Curated by Ritvik Rastogi

May 3, 2024

2 stories

3 saves

Beyond Transformers

Based on Griffin, uses a combination of linear recurrences and local attention instead of global attention to model long sequences efficiently.

Ritvik Rastogi

Papers Explained 132: RecurrentGemma

RecurrentGemma-2B is an open model based on the Griffin architecture. It uses a combination of linear recurrences and local attention…

May 3, 2024

May 3, 2024

Introduces Real Gated Linear Recurrent Unit Layer that forms the core of the new recurrent block, replacing Multi-Query Attention for better efficiency and scalability

Ritvik Rastogi

Papers Explained 131: Hawk, Griffin

This work presents the Real-Gated Linear Recurrent Unit (RG-LRU) layer, a novel gated linear recurrent layer, around which a new recurrent…

May 1, 2024

May 1, 2024