Chaos Engineering in Large Language Models: Resilience and Robustness Testing
Main Article Content
Abstract
Chaos engineering has emerged as a critical practice to enhance the resilience and fault-tolerance of distributed systems, but its application in Large Language Models (LLMs) is still in its infancy. This paper outlines the framework for chaos engineering in LLM-based architectures, addressing key challenges and strategies for enhancing their robustness. The study presents methodologies for introducing controlled failure experiments into the LLM pipeline to ensure continued reliability and performance under unexpected conditions, making LLM systems more resilient to real-world scenarios. Keywords include chaos engineering, large language models, robustness, and fault injection.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
CC BY-NC-SA 4.0
Attribution Non-Commercial Share-alike 4.0 International
Visit here for more details: https://creativecommons.org/licenses/by-nc-sa/4.0/
References
Panagiotis Dedousis, George Stergiopoulos, George Arampatzis, Dimitris Gritzalis, Enhancing Operational Resilience of Critical Infrastructure Processes Through Chaos Engineering,. IEEE Access. Softw. Eng., vol. 11, pp. 106172 - 106189, September 2023.
Monge Solano Ignacio, Mat´ok, Enik˝o, Developing for Resilience: Introducing a Chaos Engineering tool. Malm¨o universitet/Teknik ochsamh¨alle, p. 93, 2020