PinnedI truly appreciate your kind words, It’s an honor to be acknowledged by the author, and I’m really…Jan 31Jan 31
PinnedThanks for the appreciation, Its surreal for me to get acknowledged from the author itself.Feb 1, 2024Feb 1, 2024
Papers Explained 310: SmolLM2SmolLM2 is a 1.7B parameter language model overtrained on ~11 trillion tokens of data using a multi-stage training process that mixes web…2d ago2d ago
Papers Explained : AceCoderAceCoder leverages automated large-scale test-case synthesis to enhance code model training. A pipeline is designed that generates…3d ago3d ago
Papers Explained 308: SFT Memorizes, RL GeneralizesThis paper studies the comparative effect of SFT and RL on generalization and memorization, focusing on text-based and visual environments…4d ago4d ago
Papers Explained 307: Diverse Preference OptimizationDiverse Preference Optimization (DivPO) is an online optimization method which learns to generate much more diverse responses than standard…5d ago5d ago
Papers Explained 306: Critique Fine-TuningCritique Fine-Tuning (CFT) is a strategy where models learn to critique noisy responses rather than simply imitate correct ones. Inspired…6d ago6d ago
Papers Explained 305: HyperfittingLLMs tend to generate repetitive and dull sequences, a phenomenon that is especially apparent when generating using greedy decoding. This…Feb 7Feb 7
Papers Explained 304: Constrained Generative Policy Optimization (Mixture of Judges)RLHF has limitations in multi-task learning (MTL) due to challenges of extreme multi-objective optimization (i.e., trade-off of multiple…Feb 6Feb 6
Papers Explained 303: Reward rAnked FineTuning (RAFT)Generative foundation models can inherit implicit biases from their extensive unsupervised training data, leading to suboptimal samples…Feb 5Feb 5