![]() |
CMU-S3D-25-101 Software and Societal Systems Department School of Computer Science, Carnegie Mellon University
Navigating Challenges with LLM-based Code Nikitha Rao April 2025
Ph.D. Thesis
The software development process is rapidly evolving with the advancement of Large Language Models (LLMs). LLMs are not only transforming the way code is written but are also increasingly integrated into AI programming tools, such as ChatGPT and GitHub Copilot, to enhance developer productivity by generating programs from natural language instructions, identifying and fixing bugs, generating documentation and so on. These LLMs are pretrained on large volumes of natural language and code data. They are trained using cross-entropy and preference losses that have no coefficient for correctness and only optimize for matching the ground truth. Therefore, despite their proficiency in learning code syntax, they fall short in capturing semantic signals. To date, the main focus of efforts to improve these models has been training larger models and collecting more human preference data. However, user studies have found notable issues with the usability of these larger models, including difficulty in understanding the generated code, the presence of subtle bugs that are hard to find, and a lack of verification of the generated code. This dissertation demonstrates that integrating domain insights from software engineering into AI-based code generation can enhance reliability and utility for developers. This is done by empowering the model to take on a more active role in building valid and usable code, instilling greater trust among users in the capabilities of the model. I focus on three main challenges identified by prior work and propose solutions using software-specific insights.
(1) The generated code can be difficult to understand and manipulate, especially for non-expert programmers. To address this, I contribute LOWCODER, a tool that abstracts away the syntactic complexity associated with traditional code and provides a more user-friendly interface using drag-and-drop functionality. As a result, LOWCODER provides a trusted environment where users can leverage the capabilities of AI without the need for extensive coding knowledge. The goal of my dissertation is to demonstrate the significance of integrating software-specific insights when training models to make code generation more reliable and useful for developers. My dissertation work contributes several artifacts including datasets, evaluation frameworks and models that are trained by integrating software-specific insights to improve the quality of generated code. Importantly, these models are all quite small relative to cutting-edge general purpose models like GPT-4. While large, general models can also be very useful for these tasks, they have their own limitations: few companies can afford the immense resources required to train such large models, and most of these models are closed-source and provide limited (free) access to the community which can be unreliable. In contrast, my work produces smaller open-source models that are specialized to perform various programming related tasks, resulting in tools that make code generation more reliable and useful for developers.
134 pages
Nicolas Christin, Head, Software and Societal Systems Department
|
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |