Poisoning Encoders • Salika Dave

A paper review summary part of my coursework in IST597: Trustworthy Machine Learning

PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning

Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong

Summary

In this paper, the authors propose a new kind of poisoning attack – PoisonedEncoder that can poison deep neural networks used for image processing by manipulating the vision encoder. In the process of pretraining an encoder for supervised and contrastive learning, augmented views of the same image produce similar feature vectors but dissimilar feature vectors for different vectors. In the downstream classifier, extracted feature vectors are combined along with the labels and then subsequently used for testing. To poison an encoder, an attacker injects noisy inputs into the unlabelled data such that the encoder should misclassify the label for the input image during testing. This attack is formulated as a bi-level optimization problem where the goal is to maximize the cosine similarity between the target and reference input. The paper evaluates the PoisonedEncoder attack on multiple datasets and compares it with other state-of-the-art attacks, demonstrating its effectiveness and superiority. The paper also proposes and evaluates several defenses against the PoisonedEncoder attack, providing insights into how to mitigate the attack.

Results

The PoisonedEncoder attack is evaluated on datasets – CIFAR10, STL10, Facemask, EuroSAT, and Tiny-ImageNet. PoisonedEncoder achieves at least 0.2 higher attack success rate than ICP, while Witches’ Brew appears to be ineffective. Additionally, the authors use the metric of clean and poisoned accuracy that refers to inference results depending on the pretraining the vision encoder and downstream classifiers with clean data and noisy input data respectively. The results show that the difference between the two accuracies of classifiers is small, indicating that the attack better preserves the utility of the encoder. Lastly, it has also been shown that fine-tuning can reduce the attack success rate of the PoisonedEncoder without sacrificing the encoder’s utility, but this requires manually collecting a large set of clean images.

Strengths

This paper concisely explains a very simple but significant way in which a vision encoder for downstream tasks in contrastive learning can be attacked. One of the major strengths of this paper is that the attack has been thoroughly evaluated by existing defense methods that can be used during pre-processing, in-processing as well as post-processing. Secondly, this attack has been evaluated on a variety of datasets that are commonly used in contrastive learning that makes the author’s contribution more thorough and precise. Lastly, the results have been concisely presented regarding the hyperparameter tuning on the chosen contrastive algorithm as well as the attack success rates and the differences between clean and poisoned accuracy.

Possible directions for future work

While the paper stands strong in its claims, the attack has been shown to be designed for contrastive learning with no results related to unsupervised learning. Additionally, it should be noted that this attack exploits the random cropping operation that contrastive learning relies on to construct poisoning inputs. This might make it unsuitable for unsupervised learning architectures. Secondly, this attack is specifically being tested only on the SimCLR as the representative algorithm. It would be helpful to explore the effects of this attack on other contrastive learning architectures like BYOL.

References

USENIX Security ’22 - PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning from https://www.youtube.com/watch?v=9hDnMm_cVbQ