织梦CMS - 轻松建站从此开始!

abg欧博官网|登陆|游戏|

101: Code Generation by Emulating Software Process

时间:2025-12-14 22:53来源: 作者:admin 点击: 0 次
Abstract page for arXiv paper 2403.15852: SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents

[Submitted on 23 Mar 2024 (v1), last revised 31 Oct 2024 (this version, v2)]

Title:SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents

Authors:Feng Lin, Dong Jae Kim, Tse-Husn (Peter)Chen

View a PDF of the paper titled SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents, by Feng Lin and 2 other authors

View PDF HTML (experimental) Abstract:Software process models are essential to facilitate collaboration and communication among software teams to solve complex development tasks. Inspired by these software engineering practices, we present FlowGen - a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents. We emulate three process models, FlowGenWaterfall, FlowGenTDD, and FlowGenScrum, by assigning LLM agents to embody roles (i.e., requirement engineer, architect, developer, tester, and scrum master) that correspond to everyday development activities and organize their communication patterns. The agents work collaboratively using chain-of-thought and prompt composition with continuous self-refinement to improve the code quality. We use GPT3.5 as our underlying LLM and several baselines (RawGPT, CodeT, Reflexion) to evaluate code generation on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET. Our findings show that FlowGenScrum excels compared to other process models, achieving a Pass@1 of 75.2, 65.5, 82.5, and 56.7 in HumanEval, HumanEval-ET, MBPP, and MBPP-ET, respectively (an average of 15% improvement over RawGPT). Compared with other state-of-the-art techniques, FlowGenScrum achieves a higher Pass@1 in MBPP compared to CodeT, with both outperforming Reflexion. Notably, integrating CodeT into FlowGenScrum resulted in statistically significant improvements, achieving the highest Pass@1 scores. Our analysis also reveals that the development activities impacted code smell and exception handling differently, with design and code review adding more exception handling and reducing code smells. Finally, FlowGen models maintain stable Pass@1 scores across GPT3.5 versions and temperature values, highlighting the effectiveness of software process models in enhancing the quality and stability of LLM-generated code.

Comments:   ICSE 2025  
Subjects:   Software Engineering (cs.SE); Artificial Intelligence (cs.AI)  
Cite as:   arXiv:2403.15852 [cs.SE]  
    (or arXiv:2403.15852v2 [cs.SE] for this version)  
    https://doi.org/10.48550/arXiv.2403.15852

Focus to learn more

arXiv-issued DOI via DataCite

(责任编辑:)
------分隔线----------------------------
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
用户名: 验证码:
发布者资料
查看详细资料 发送留言 加为好友 用户等级: 注册时间:2025-12-16 13:12 最后登录:2025-12-16 13:12
栏目列表
推荐内容