Ritvik Rastogi

May 3, 2024

2 stories

3 saves

Based on Griffin, uses a combination of linear recurrences and local attention instead of global attention to model long sequences efficiently.
Introduces Real Gated Linear Recurrent Unit Layer that forms the core of the new recurrent block, replacing Multi-Query Attention for better efficiency and scalability
Ritvik Rastogi

Ritvik Rastogi

Data Scientist, 2x Kaggle Expert