AI-Generated Feedback in Programming Education: Ensuring High Quality and Pedagogically-Guided Interaction
Minh Tung Phung
Max Planck Institute for Software Systems
30 Mar 2026, 11:00 am - 12:00 pm
Saarbrücken building E1 5, room 029
SWS Student Defense Talks - Thesis Proposal
Generative AI holds great promise in enhancing programming education by
automatically generating personalized feedback for students. However, ensuring
that this feedback is both technically accurate and pedagogically effective
remains a critical challenge before these systems can be safely deployed in
real-world classrooms. This thesis investigates the end-to-end integration of
generative AI in programming education, divided into two main parts.
The first part focuses on the optimization of AI-generated feedback quality. We introduce novel techniques that not only enhance the generated feedback but also perform automatic validation of the feedback before returning it. ...
The first part focuses on the optimization of AI-generated feedback quality. We introduce novel techniques that not only enhance the generated feedback but also perform automatic validation of the feedback before returning it. ...
Generative AI holds great promise in enhancing programming education by
automatically generating personalized feedback for students. However, ensuring
that this feedback is both technically accurate and pedagogically effective
remains a critical challenge before these systems can be safely deployed in
real-world classrooms. This thesis investigates the end-to-end integration of
generative AI in programming education, divided into two main parts.
The first part focuses on the optimization of AI-generated feedback quality. We introduce novel techniques that not only enhance the generated feedback but also perform automatic validation of the feedback before returning it. Specifically, to improve feedback quality, our techniques contextualize the prompt with similar examples from the database and uses symbolic information of failing test cases and fixes. Next, to validate the quality of AI-generated feedback, they leverage another AI agent as simulated students in a run-time validation mechanism. These techniques achieve high-precision, human tutor-style feedback.
The second part transitions to the deployment of the feedback systems in real-world classroom settings, focusing on student-instructor-AI interaction. Specifically, to ensure feedback meets both expert educators' and students' quality standards, we investigate the discrepancies between expert-created rubrics and student perceptions of hint helpfulness. To understand how to position AI-generated hints with traditional pedagogical practices, we examine the interplay between AI-generated hints and student reflection. To address the problem of students being over-reliant on AI support, we base our design on metacognitive theory to introduce different hint types with quotas to require students' critical engagement during interaction with the system. Finally, to ensure students receive relevant support in difficult cases when AI is insufficient, we propose a hybrid instructor-in-the-loop escalation mechanism, allowing instructors to efficiently involve and support students when most needed.
Ultimately, this thesis provides a foundational framework for deploying LLMs that balance automated efficiency with established pedagogical standards and human oversight.
Read more
The first part focuses on the optimization of AI-generated feedback quality. We introduce novel techniques that not only enhance the generated feedback but also perform automatic validation of the feedback before returning it. Specifically, to improve feedback quality, our techniques contextualize the prompt with similar examples from the database and uses symbolic information of failing test cases and fixes. Next, to validate the quality of AI-generated feedback, they leverage another AI agent as simulated students in a run-time validation mechanism. These techniques achieve high-precision, human tutor-style feedback.
The second part transitions to the deployment of the feedback systems in real-world classroom settings, focusing on student-instructor-AI interaction. Specifically, to ensure feedback meets both expert educators' and students' quality standards, we investigate the discrepancies between expert-created rubrics and student perceptions of hint helpfulness. To understand how to position AI-generated hints with traditional pedagogical practices, we examine the interplay between AI-generated hints and student reflection. To address the problem of students being over-reliant on AI support, we base our design on metacognitive theory to introduce different hint types with quotas to require students' critical engagement during interaction with the system. Finally, to ensure students receive relevant support in difficult cases when AI is insufficient, we propose a hybrid instructor-in-the-loop escalation mechanism, allowing instructors to efficiently involve and support students when most needed.
Ultimately, this thesis provides a foundational framework for deploying LLMs that balance automated efficiency with established pedagogical standards and human oversight.