[TOC]

  1. Title: The Impact of Reasoning Steps Length on Large Language Models
  2. Author: Mingyu Jin et. al.
  3. Publish Year: 20 Jan 2024
  4. Review Date: Mon, Jan 29, 2024
  5. url: arXiv:2401.04925v3

Summary of paper

Contribution

The study investigates the impact of the length of reasoning steps in prompts on the reasoning abilities of Large Language Models (LLMs), focusing on Chain of Thought (CoT). Here are the key findings:

  1. Effect of Reasoning Step Length:

    • Lengthening reasoning steps in prompts, even without introducing new information, notably improves LLMs’ reasoning abilities across various datasets.
    • Conversely, shortening reasoning steps, while preserving key information, notably diminishes LLMs’ reasoning abilities.
    • This suggests the critical role of reasoning step length in CoT prompts and offers practical insights for leveraging LLMs in complex problem-solving scenarios.
  2. Impact of Rationales:

    • Surprisingly, even incorrect rationales can lead to favorable outcomes if they maintain the necessary length of inference.
    • This finding suggests that the length of reasoning steps may compensate for inaccuracies in rationales, emphasizing the importance of sequence length in CoT.
  3. Task-Dependent Nature:

    • The advantages of increasing reasoning steps vary depending on the complexity of tasks:
      • Simpler tasks require fewer steps.
      • Complex tasks benefit significantly from longer inference sequences.

The study underscores the significance of reasoning step length in CoT prompts for enhancing LLMs’ reasoning abilities and provides practical guidance for optimizing their performance in diverse problem-solving contexts.

Some key terms

incorrect but coherent rationales can improve reasoning performance

  • Interestingly, Wang et al. found that even incorrect but coherent rationales can improve reasoning performance, highlighting the value of logical continuity (Wang et al., 2023).

Strategies

Few-shot setting

  • think about the word. This process does not introduce new information.
  • Read the question again: Read the questions repeatedly to reduce the interference of other texts on the chain of thought.
  • Repeat State: we include a small summary of the current state after a long chain of reasoning
  • Self-Verification: before the model gets the answer, we add a self-verification process to judge whether the answer is reasonable based on some basic information.

Zero-shot setting

  • altered the initial prompt from “Let’s think step by step" to “Let’s think step by step, you must think more steps"

Results

The study emphasizes the significance of the length of the thinking chain rather than its accuracy in improving Chain of Thought (CoT) performance. Here are the key findings:

  1. Linear Correlation between Step Count and Accuracy:

    • In few-shot CoT scenarios, there exists a direct linear correlation between the number of reasoning steps and accuracy.
    • Lengthening reasoning steps notably enhances Large Language Models’ (LLMs) reasoning abilities across multiple datasets.
    • Conversely, shortening reasoning steps significantly diminishes model performance, even when key information is preserved.
  2. Role of Incorrect Rationales:

    • Even incorrect rationales can produce favorable outcomes if they maintain the necessary length of inference.
    • Errors in intermediate numbers, particularly in process-oriented tasks like mathematical problems, have a minor impact on overall performance.
  3. Task-Dependent Nature:

    • The benefits of increasing reasoning steps depend on the complexity of tasks:
      • Simpler tasks require fewer steps.
      • More complex tasks benefit significantly from longer inference sequences.
  4. Enhancement in Zero-Shot CoT:

    • Increasing reasoning steps in zero-shot CoT notably improves LLM accuracy.
    • Altering the initial prompt to explicitly encourage more reasoning steps led to noticeable enhancements, particularly in datasets involving mathematical problems.

Overall, the findings suggest that optimizing CoT prompting involves prioritizing the length of the reasoning chain, which significantly impacts LLMs’ reasoning abilities across various tasks and scenarios.