101: Code Generation by Emulating Software Process

[Submitted on 23 Mar 2024 (v1), last revised 31 Oct 2024 (this version, v2)]

Title:SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents

Authors:Feng Lin, Dong Jae Kim, Tse-Husn (Peter)Chen

View a PDF of the paper titled SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents, by Feng Lin and 2 other authors

View PDF HTML (experimental) Abstract:Software process models are essential to facilitate collaboration and communication among software teams to solve complex development tasks. Inspired by these software engineering practices, we present FlowGen - a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents. We emulate three process models, FlowGenWaterfall, FlowGenTDD, and FlowGenScrum, by assigning LLM agents to embody roles (i.e., requirement engineer, architect, developer, tester, and scrum master) that correspond to everyday development activities and organize their communication patterns. The agents work collaboratively using chain-of-thought and prompt composition with continuous self-refinement to improve the code quality. We use GPT3.5 as our underlying LLM and several baselines (RawGPT, CodeT, Reflexion) to evaluate code generation on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET. Our findings show that FlowGenScrum excels compared to other process models, achieving a Pass@1 of 75.2, 65.5, 82.5, and 56.7 in HumanEval, HumanEval-ET, MBPP, and MBPP-ET, respectively (an average of 15% improvement over RawGPT). Compared with other state-of-the-art techniques, FlowGenScrum achieves a higher Pass@1 in MBPP compared to CodeT, with both outperforming Reflexion. Notably, integrating CodeT into FlowGenScrum resulted in statistically significant improvements, achieving the highest Pass@1 scores. Our analysis also reveals that the development activities impacted code smell and exception handling differently, with design and code review adding more exception handling and reducing code smells. Finally, FlowGen models maintain stable Pass@1 scores across GPT3.5 versions and temperature values, highlighting the effectiveness of software process models in enhancing the quality and stability of LLM-generated code.

Comments: ICSE 2025
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as: arXiv:2403.15852 [cs.SE]
(or arXiv:2403.15852v2 [cs.SE] for this version)
https://doi.org/10.48550/arXiv.2403.15852

Focus to learn more

arXiv-issued DOI via DataCite

(责任编辑：)

搜索

热门标签:

101: Code Generation by Emulating Software Process