A paper review summary part of my coursework in IST597: Trustworthy Machine Learning
Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
Jinyuan Jia, Xiaoyu Cao, and Neil Zhenqiang Gong
Summary
This paper proposes a certified defense mechanism derived from an ensemble learning method —“bootstrap aggregating” (bagging). This mechanism has shown to be effective in data poisoning attacks where the training examples are corrupted by an attacker by modifications, deletions or insertions of noisy samples. It presents the notion of a certified poisoning size that controls the number of corrupted training samples for bagging’s robustness such that it is tight if no assumptions on the base learning algorithm are made. The method involves training multiple base models on random subsamples of a training dataset using a base learning algorithm and uses majority vote to predict labels of testing examples. Additionally, the derivation includes lower and upper bounds on label probabilities and uses the Neyman-Pearson Lemma to compute certified poisoning sizes. The experimental results show that the defense method is effective and achieves a high level of accuracy on the MNIST and CIFAR10 datasets.
Results
The proposed defense metric based on bagging has been evaluated on MNIST and CIFAR10 datasets and has achieved a high level of certified accuracy against data poisoning attacks. Specifically, the proposed method achieved a certified accuracy of 91.1% on MNIST when up to 100 training examples were arbitrarily modified, deleted, and/or inserted. The paper shows that the proposed mechanism achieves better performance when compared with other existing defense mechanisms such as differentially private models, randomized smoothing, and partitioning the training dataset using a hash function.
Strengths
This work presents a comprehensive mechanism that achieves a high level of certified robustness against data poisoning attacks where the threshold of poisoned inputs is defined by the poisoning size. Secondly, the paper presents a thorough proof of the associated derivation of certifying robustness. Additionally, the method has been experimented on widely used datasets that makes it a promising fit to secure a wide range of machine learning models and datasets. Last but not the least, the paper shows a very high accuracy (>90%) on the MNIST dataset that adds to its strengths.
Possible directions for future work
While the paper stands strong in its claims, this defense method appears to only be tested against CV(CNN) models and not NLP models (Transformer architecture). Additionally, the method does seem to be applicable to securing recommendation systems. A recommender system lists top–K items that are recommended to a user but a machine learning classifier can only predict a single label. This is better explained by Huang et. al in Data Poisoning Attacks to Deep Learning Based Recommender Systems. It would be helpful to explore future work to apply this certified robustness to a recommender system.
References
Hai Huang, Jiaming Mu, Neil Zhenqiang Gong, Qi Li, Bin Liu, and Mingwei Xu. 2021. Data Poisoning Attacks to Deep Learning Based Recommender Systems. Proceedings 2021 Network and Distributed System Security Symposium (2021). DOI:https://doi.org/10.14722/ndss.2021.24525