[TOC]

  1. Title: Structured Chaint of Thought Prompting for Code Generation 2023
  2. Author: Jia Li et. al.
  3. Publish Year: 7 Sep 2023
  4. Review Date: Wed, Feb 28, 2024
  5. url: https://arxiv.org/pdf/2305.06599.pdf

Summary of paper

image-20240229114924548

Contribution

The paper introduces Structured CoTs (SCoTs) and a novel prompting technique called SCoT prompting for improving code generation with Large Language Models (LLMs) like ChatGPT and Codex. Unlike the previous Chain-of-Thought (CoT) prompting, which focuses on natural language reasoning steps, SCoT prompting leverages the structural information inherent in source code. By incorporating program structures (sequence, branch, and loop structures) into intermediate reasoning steps (SCoTs), LLMs are guided to generate more structured and accurate code. Evaluation on three benchmarks demonstrates that SCoT prompting outperforms CoT prompting by up to 13.79% in Pass@1, is preferred by human developers in terms of program quality, and exhibits robustness to various examples, leading to substantial improvements in code generation performance.

Some key terms

code generation

code generation aims to automatically a program tat satisfies a given natural language requirement.

Structured CoT

Evaluation

Three representative benchmarks (i.e., HumanEval [ 7], MBPP [ 2], and MBCPP [1]). We use unit tests to measure the correctness of generated programs and report the Pass@๐‘˜ (๐‘˜ โˆˆ [1, 3, 5]) [7 ].

Pass@k

Specifically, given a requirement, a code generation model is allowed to generate ๐‘˜ programs. The requirement is solved if any generated programs pass all test cases. We compute the percentage of solved requirements in total requirements as Pass@๐‘˜. For Pass@๐‘˜, a higher value is better.

Results

image-20240229122043590

Summary

A CoT is several intermediate natural language reasoning steps. This paper propose Structured CoT to allow program structures (i.e., sequence, branch, and loop structures) to explicitly reveal the program structure, which is helpful for code generation.