Demystifying Long Chain-of-Thought Reasoning in LLMs

This study systematically investigate the mechanics of long CoT reasoning, identifying the key factors that enable models to generate long CoT trajectories and providing practical guidance for optimizing training strategies to enhance long CoT reasoning in LLMs

https://arxiv.org/pdf/2502.03373.pdf

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

This work introduces first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL and achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.

https://arxiv.org/pdf/2501.12948.pdf

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model’s hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions? Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.

https://arxiv.org/pdf/2407.20311.pdf

The Mystery of the Pathological Path-star Task for Language Models

The recently introduced path-star task is a minimal task designed to exemplify limitations to the abilities of language models, and a regularization method is introduced using structured samples of the same graph but with differing target nodes, improving results across a variety of model types.

https://www.semanticscholar.org/reader/b3c5da33f73b8d4b77c107134e05957b20d544ba

* What Algorithms can Transformers Learn? A Study in Length Generalization

This work proposes a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task and provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.

https://www.semanticscholar.org/reader/1ec3a3ff77cb4b424499b3805ecc90182ecd8f8b