A paper review summary part of my coursework in IST597: Trustworthy Machine Learning
AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning
Summary
This paper presents a defense tactic — AttriGuard to defend against attribute inference attacks. An attacker has access to a user’s public data and uses it to train a machine learning classifier to extract a user’s private attributes. As per the threat model the user passes the true public data to the defender that will expose the public data by adding a certain amount of noise as per the policy defined for the defender classifier. The policies might involved modifying the existing public attributes, adding new attributes or both. The defender does not know the attacker’s classifier and does not have access to the user’s true private attributes. Hence, this defense method tries to approximate the attacker’s classifier by trying to reach a target probability distribution by approximating the value of noise to be added to the user’s public data. This is achieved in two phases — the first phase that involves minimum addition of noise to evade the attacker’s classifier and the second phase that involves picking a random attribute value ‘q’ that roughly describes the attacker’s inference’s probability distribution.
Results
The choice of the defender’s classifier (Neural Network or Logistic Regression model) is studied against the attacker classifier and it has been shown that AttriGuard is most effective when the classifier is the same. When studying the effect of the different noise-type policies, it was found that Modify_Add outperformed Add_New that performed better than ModifyExist. Additionally, AttriGuard was found to incur a smaller utility loss when compared against other baseline methods [BlurMe, ChiSquare].
Strengths
AttriGuard is computationally less expensive when compared to existing defense methods when compared to the Game-theoretic methods and a smaller utility loss compared to the method of Local Differential Privacy (LDP). Additionally, this method is more effective compared to the baseline methods which are BlurMe, ChiSquare, Quantization Probabilistic Mapping and Local Differential Privacy-Succinct Histogram. BlurMe and ChiSquare require the defender to know the user’s true private attribute values which is not required to be known by AttriGuard. The results show that the algorithm achieves a lower probability of attribute inference than the mentioned baseline methods while also introducing less distortion to the data.
Possible directions for future work
As the paper mentions, AttriGuard is most effective when the user’s private attributes are one-time recorded data and may be less effective when the data recorded is dynamic. When user’s have multiple attributes the attacker could leverage the correlation between the attributes to attack the model.