Categories
Organizational Behivour

“Combining Large Language Models and Formal Verification for Automated Code Repair in AI Software: A Systematic Literature Review” Enhancing Reliability and Security of AI Systems through Automated Counterexample-Guided Repair

I need to do a systematic literature review of my work
Following this template, the link is the starting point (template) for conducting a systematic literature review. 
https://www.sciencedirect.com/science/article/pii/S016412120600197X
In particular, it is possible to view published papers on “automated code repair” in      order to keep up with new knowledge in the field.
A PowerPoint presentation on your systematic literature review could be prepared and you could consider using Latex to document it. 
This is the background of my assignment along with a description and available references:
Background
Bugs in AI software can cause various issues, such as crashes, data loss, poor performance, and incorrect results. They can also create vulnerabilities that compromise the software’s security. Therefore, detecting and fixing these bugs early during development is crucial. Researchers have been working on automated software testing, fault localization, and repair to address these challenges. While traditional static analysis helps identify bugs, it often generates false alarms, affecting developers’ productivity. Recent advancements in deep learning have offered potential solutions to long-standing problems in software engineering. Some existing approaches, like DLFix [1] and DeepRepair [2], use deep learning techniques to fix program defects by treating source code as text.
On the one hand, source code has distinct syntax and semantics compared to natural language, and these approaches may produce incorrect results as they rely on limited previously seen data. On the other hand, CURE [3] and DEAR [4] employ programming language models to analyse and model the source code, considering its unique characteristics and context. Another recent development is the Large Language Models (LLMs), such as OpenAI’s Codex [5], specifically designed for program repair. For example, InferFix [6] utilises LLMs to fix specific issues like Null Pointer Dereference, Resource Leaks, and Thread Safety Violations. Research has shown that directly applying state-of-the-art LLMs can outperform automated program repair techniques [7,8]. 
Description
Standard development libraries focus on delivering efficient AI code, which can often contain security vulnerabilities, necessitating time-consuming fixes for software developers. This project combines Large Language Models (LLMs) with Formal Verification (FV) to ensure standard libraries produce secure and efficient AI code. It uses FV to find security vulnerabilities in the AI code and get counterexamples. It feeds these counterexamples and the AI code to LLM. You will create a prompt language for debugging and generating code. Also, you will use FV to further scrutiny the code generated by LLM to ensure its security and reliability. You will implement this method on top of the ESBMC-AI framework to find and repair errors in AI code written in C/C++. Lastly, you will collect code samples from AI applications and evaluate this method to repair insecure AI code.
Deliverables
1. We aim to develop the ESBMC-AI framework, leveraging open-source, permissively licensed Large Language Models (LLMs) and formal verification engines. This framework will empower users to produce AI code that is efficient, robust, and secure.
2. We aim to comprehensively study the ESBMC-AI framework’s capabilities in repairing security vulnerabilities, specifically focusing on memory safety properties in AI code written in industrial programming languages like C/C++. Our primary emphasis will be on applications in embedded and IoT devices.
3. We will perform an extensive evaluation using a large set of AI programs designed for this experiment to demonstrate the effectiveness of combining LLM with formal verification. The goal is to showcase how this approach can accurately identify and explain security vulnerabilities while proposing effective patches.
4. Ultimately, we seek to create trustworthy AI code seamlessly running on embedded and IoT devices. Our automated counterexample-guided repair framework will ensure that AI systems maintain high reliability and security.
References
[1] Y. Li, S. Wang, and T. N. Nguyen, “Dlfix: Context-based code transformation learning for automated program repair,” in Proceedings of the ACM/IEEE 42nd Int. Conference on Software Engineering, 2020, pp. 602–614.
[2] M. White, M. Tufano, M. Martinez, M. Monperrus, and D. Poshyvanyk, “Sorting and transforming program repair ingredients via deep learning code similarities,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2019, pp. 479–490.
[3] N. Jiang, T. Lutellier, and L. Tan, “Cure: Code-aware neural machine translation for automatic program repair,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 1161–1173.
[4] Y. Li, S. Wang, and T. N. Nguyen, “Dear: A novel deep learning-based approach for automated program repair,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 511–523.
[5] Chen et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021. [Online]. Available: http://arxiv.org/abs/2107.03374.
[6] M. Jin, S. Shahriar, M. Tufano, X. Shi, S. Lu, N. Sundaresan, and A. Svyatkovskiy, “Inferfix: End-to-end program repair with llms,” arXiv preprint arXiv:2303.07263, 2023.
[7] C. Xia, Y. Wei, L. Zhang. Automated Program Repair in the Era of Large Pre-trained Language Models. ICSE 2023 (to appear).
[8] Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, S. Hwei Tan. Automated Repair of Programs from Large Language Models. ICSE 2023 (to appear).
Yiannis Charalambous, Norbert Tihanyi, Ridhi Jain, Youcheng Sun, Mohamed Amine Ferrag, Lucas C. Cordeiro: A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification. CoRR abs/2305.14752 (2023)
Norbert Tihanyi, Tam??s Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C. Cordeiro, Vasileios Mavroeidis: The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification. CoRR abs/2307.02192 (2023)
Mohamed Amine Ferrag, Ammar Battah, Norbert Tihanyi, M??rouane Debbah, Thierry Lestable, Lucas C. Cordeiro: SecureFalcon: The Next Cyber Reasoning System for Cyber Security. CoRR abs/2307.06616 (2023)