Back

/ 3 min read

RanMASK against Text Adversarial Attacks

A paper review summary part of my coursework in IST597: Trustworthy Machine Learning

Certified Robustness to Text Adversarial Attacks by Randomized [MASK]

Jiehang Zeng, Xiaoqing Zheng, Jianhan Xu, Linyang Li, Liping Yuan, Xuanjing Huang

PDF| Paper Code

Summary

This paper presents a defense method – RanMASK based on randomized smoothing for adversarial attacks against BERT-based models used for NLP tasks. It works by repeatedly performing “MASK” operations on the input text so that there is a large corpus of masked text. As explained by the authors, the motivation for this idea is that if a sufficient percentage of the input text has been randomly masked and perturbed, there is a low probability of the perturbations appearing in a given input text. This method has been specifically proven to defend against word-substitution based attacks and character level perturbations.

Results

The experiments show that this method can certify the text classifications of over 50% texts to be robust to any perturbation of five words on AGNEWS, and two words on SST2 dataset. The AGNEWS is a dataset for the task of text classification whereas the SST2 dataset is a set of movie reviews used for the task of sentiment analysis. This method has been tested against standard benchmarks for attack algorithms - TextFooler, BERT-Attack and DeepWordBug and the accuracy has been compared against SAFER. Additionally, the authors consider two ensemble methods: logits-summed ensemble (logit) and majority-vote ensemble (vote). SAFER and RanMASK have been found to behave differently as the ensemble methods are changed.

Strengths

RanMASK strives for a certifiably robust model through standard defense mechanisms of data augmentation, and adversarial training samples by randomly masking a part of the input words. This defends the model against character-level perturbations as well as word-substitution based attacks. This is further illustrated in the paper with examples of adversarial samples. [Refer Table 1]. This method is an improvement to SAFER since the former method was not able to defend against word-substitution based attacks and assumed that the method of generating synonyms is known to the attacking algorithms. Additionally, this method has been tested on BERT, a widely used neural network as well as tested for NLP tasks [sentiment analysis, text classification, natural language inference] that are used in a large domain of applications.

Possible directions for future work

While the paper stands strong in the claims made, it would be helpful to evaluate the performance of this defense method in NLP tasks such as named-entity recognition, POS, machine translation, dialog systems and text summarization. Secondly, as explained in “Certified Robustness for Large Language Models with Self-Denoising”, this method of randomized smoothing is computationally expensive when applied to LLMs. This is because RanMASK requires adding noise to the input before prediction and the accuracy of the model during certification largely depends on the performance of the model on noisy data. In the case of LLMs, there is limited access to the parameters and fine-tuning is not always computationally viable.

Conclusion

While the paper stands strong in the claims made, it would be helpful to evaluate the performance of this defense method in NLP tasks such as named-entity recognition, POS, machine translation, dialog systems and text summarization. Secondly, as explained in Zhang et. al. in [1], this method of randomized smoothing is computationally expensive when applied to LLMs. This is because RanMASK requires adding noise to the input before prediction and the accuracy of the model during certification largely depends on the performance of the model on noisy data. In the case of LLMs, there is limited access to the parameters and fine-tuning is not always computationally viable.

References

  1. Zhen Zhang, Guanhua Zhang, Bairu Hou, Wenqi Fan, Qing Li, Sijia Liu, Yang Zhang, and Shiyu Chang. Certified Robustness for Large Language Models with Self-Denoising. Retrieved April 24, 2024 from https://arxiv.org/pdf/2307.07171.pdf