List: Small LLMs | Curated by Ritvik Rastogi

Jul 1, 2024
13 stories
1 save
Small LLMs
Utilizes interleaving local-global attentions and group-query attention, trained with knowledge distillation instead of next token prediction to achieve competitive performance comparable with larger models.
Ritvik Rastogi
Papers Explained 157: Gemma 2Gemma 2 is a new addition to the Gemma family with several technical modifications, including interleaving local-global attentions and…
6d ago
6d ago
A fully open language model designed to enhance accuracy while using fewer parameters and pre-training tokens. Utilizes a layer-wise scaling strategy to allocate smaller dimensions in early layers, expanding in later layers.
Ritvik Rastogi
Papers Explained 134: Open ELMOpenELM is an open language model by Apple with not only open source model weights and inference code but the complete framework for…
May 8
May 8
A fully open language model designed to enhance accuracy while using fewer parameters and pre-training tokens. Utilizes a layer-wise scaling strategy to allocate smaller dimensions in early layers, expanding in later layers.
Ritvik Rastogi
Papers Explained 130: Phi-3phi-3-mini is a 3.8B language model trained on 3.3T tokens data which is a scaled-up version of the one used for phi-2, composed of heavily…
Apr 29
Apr 29
Open code models based on Gemma models by further training on over 500 billion tokens of primarily code
Ritvik Rastogi
Papers Explained 124: CodeGemmaCodeGemma is a collection of open code models built on top of Gemma by further training on more than 500 billion tokens of code, capable of…
Apr 15
Apr 15
A family of 2B and 7B, state-of-the-art language models based on Google's Gemini models, offering advancements in language understanding, reasoning, and safety.
Ritvik Rastogi
Papers Explained 106: GemmaGemma are a family of lightweight (2B and 7B), state-of-the art open language models built from the research and technology used to create…
Feb 28
Feb 28
A language model trained on 1T tokens following the core principles of LLama 2 and Mistral, leveraging and refining various techniques for pre-training large language models.
Ritvik Rastogi
Papers Explained 111: H2O Danube 1.8BH2O-Danube-1.8B is a new open-source pre-trained foundation model with 1.8 billion parameters, developed by H2O.ai. It was trained on 1…
Mar 11
Mar 11
A state-of-the-art, truly open language model and framework that includes training data, code, and tools for building, studying, and advancing language models.
Ritvik Rastogi
Papers Explained 98: OLMoOLMo (Open Language Model) is a state-of-the-art, truly open language model and framework that aims to provide the research community with…
Feb 7
1
Feb 7
1
A  1.1B language model built upon the architecture and tokenizer of Llama 2, pre-trained on around 1 trillion tokens for approximately 3 epochs, leveraging FlashAttention and Grouped Query Attention, to achieve better computational efficiency.
Ritvik Rastogi
Papers Explained 93: TinyLlamaTinyLlama is a compact 1.1B language model built upon the architecture and tokenizer of Llama 2, pre-trained on around 1 trillion tokens…
Jan 22
Jan 22
A 2.7B model, developed to explore whether emergent abilities achieved by large-scale language models can also be achieved at a smaller scale using strategic choices for training, such as data selection.
Ritvik Rastogi
Papers Explained 116: Phi-2Phi-2 is a 2.7B parameter model that follows the phi approach, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and…
Mar 22
Mar 22
Utilizes dDPO and AI Feedback (AIF) preference data to achieve superior intent alignment in chat-based language modeling.
Ritvik Rastogi
Papers Explained 71: ZephyrZephyr is 7B LLM that utilizes distilled Direct Preference Optimization (dDPO) that significantly improves intent alignment and AI Feedback…
Nov 17, 2023
Nov 17, 2023
Leverages grouped-query attention for faster inference, coupled with sliding window attention to effectively handle sequences of arbitrary length with a reduced inference cost.
Ritvik Rastogi
in
DAIR.AI
Papers Explained: Mistral 7BMistral 7B is an LLM engineered for superior performance and efficiency. It leverages grouped-query attention (GQA) for faster inference…
Oct 23, 2023
Oct 23, 2023
Follows the phi-1 approach, focusing this time on common sense reasoning in natural language
Ritvik Rastogi
Papers Explained 115: Phi-1.5Phi-1.5 follows the phi-1 approach, focusing this time on common sense reasoning in natural language, and creating a new 1.3 billion…
Mar 20
Mar 20
An LLM for code, trained using a textbook quality data from the web and synthetically generated textbooks and exercises with GPT-3.5.
Ritvik Rastogi
Papers Explained 114: Phi-1Phi-1 is a transformer based 1.3B LLM for code, trained using a selection of “textbook quality” data from the web (6B tokens) and…
Mar 18
Mar 18