Abstract: In this paper, we introduce CTF-PWN100, a dataset and automated framework designed to evaluate the performance of large language models (LLMs) in solving Capture-The-Flag (CTF) binary ...