Chatgpt As An Attack Tool: Stealthy Textual Backdoor Attack Via Blackbox Generative Model Trigger | Awesome LLM Papers Add your paper to Awesome LLM Papers

Chatgpt As An Attack Tool: Stealthy Textual Backdoor Attack Via Blackbox Generative Model Trigger

Jiazhao Li, Yijin Yang, Zhuofeng Wu, V. G. Vinod Vydiswaran, Chaowei Xiao . Reliability Engineering & System Safety 2023 – 75 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Conversational Agents Survey Paper

Textual backdoor attacks pose a practical threat to existing systems, as they can compromise the model by inserting imperceptible triggers into inputs and manipulating labels in the training dataset. With cutting-edge generative models such as GPT-4 pushing rewriting to extraordinary levels, such attacks are becoming even harder to detect. We conduct a comprehensive investigation of the role of black-box generative models as a backdoor attack tool, highlighting the importance of researching relative defense strategies. In this paper, we reveal that the proposed generative model-based attack, BGMAttack, could effectively deceive textual classifiers. Compared with the traditional attack methods, BGMAttack makes the backdoor trigger less conspicuous by leveraging state-of-the-art generative models. Our extensive evaluation of attack effectiveness across five datasets, complemented by three distinct human cognition assessments, reveals that Figure 4 achieves comparable attack performance while maintaining superior stealthiness relative to baseline methods.

Similar Work