PinnedI truly appreciate your kind words, It’s an honor to be acknowledged by the author, and I’m really…Jan 31Jan 31
PinnedThanks for the appreciation, Its surreal for me to get acknowledged from the author itself.Feb 1, 2024Feb 1, 2024
Papers Explained 371: ReasonIRReasonIR-8B is a novel bi-encoder retriever specifically designed for reasoning-intensive retrieval tasks. It addresses the limitations of…2h ago2h ago
Papers Explained 370: Test Time Reinforcement Learning (TTRL)Test-Time Reinforcement Learning (TTRL) is a method for training LLMs using RL on unlabeled data. TTRL enables self-evolution of LLMs by…1d ago1d ago
Papers Explained 369: RM-R1RM-R1 is a family of Reasoning Reward Models designed to improve the interpretability and performance of large language models (LLMs) by…2d ago2d ago
Papers Explained 368: ThinkPRMThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. This…3d ago3d ago
Papers Explained 366: Math ShepherdMath Shepherd is a process-oriented math process reward model that assigns a reward score to each step of a math problem solution, enabling…May 15A response icon1May 15A response icon1
Papers Explained 365: DeepMathDeepMath-103K is a new dataset designed for advancing mathematical reasoning research. It comprises 103,000 mathematical problems with a…May 14May 14
Papers Explained 364: OmniMathOmniMath is a comprehensive and challenging benchmark specifically designed to assess LLMs’ mathematical reasoning at the Olympiad level…May 13May 13