深度学习对抗攻击

深度学习

１、2014-ｅxlaining and Harnessing Adversarial Examples（对抗样本的解释和利用）”[ArXiv][code][code]

提出对抗样本不是因为非线性，而是因为模型在高维空间中的线性性。
FSM 方法是计算关于x的导数，然后ｓｉｇｎ函数取值进行扰动,在$$ L{无穷} $$距离度量下。
$$ X^{adv}＝Ｘ＋ \varepsilon \cdot sign (\bigtriangledown \ J(X,y{true})) $$
梯度可以通过反向传播算法（backpropagation）计算获得。

２、2015-DeepFool: a simple and accurate method to fool deep neural networks [ArXiv][code]

基于$L_2$距离度量的非目标攻击
$min ||D(x)-X||2
subject\ to\ argmaxf(D(X) \neq\ y{true})
$

３、2015-The Limitations of Deep Learning in Adversarial Settings[ArXiv][code]

通过计算对抗显著图，寻找对输出影响最大的两个输入，每次迭代修改两个像素点，直到达到目标图像。
L0 distance metric

４、2015-Distributional Smoothing with Virtual Adversarial Training [ArXiv][code]

KL散度

５、2016-Exploring the space of adversarial images[ArXiv][code]-lua

６、2016-Adversarial Machine Learning at Scale[ArXiv][code]

９、2016-Adversarial examples in the physical world [ArXiv][code][code]

通过对对抗样本拍照片，改变了某些像素来测试对抗样本的有效性
拓展FGSM,多次应用，每次一小步,中间结果被$\varepsilon$截断
$X0^{adv}=X
\X{N+1}^{adv}=Clip{X,\varepsilon}{X{N}^{adv}+\alpha\cdot sign (\bigtriangledown J(X,y_{true}))}
$

10、2016-Simple Black-Box Adversarial Perturbations for Deep Networks[ArXiv]

数据集：CIFAR10, MNIST, SVHN, STL10, and ImageNet1000 datasets

11、2017-Towards Evaluating the Robustness of Neural Networks[ArXiv][code][code]

It is noteworthy that the original defensive distillation is broken by Carlini and Wagner’s attack
$min ||1/2(tanh(W)+1)-x)||^2_2+c\cdot l(1/2(tanh(W)+1))$

12、2017-Fast Feature Fool: A data independent approach to universal adversarial perturbations[ArXiv][code]

13、2017-Towards Deep Learning Models Resistant to Adversarial Attacks[ArXiv][code]

14、2017-Ensemble Adversarial Training: Attacks and Defenses[ArXiv][code]

15、S. Zagoruyko. Are deep learning algorithms easily hackable?[code]

16、2015-Deep neural networks are easily fooled: High confidence predictions for unrecognizable images[ArXiv]

16、2016-Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples[ArXiv]

17、2013-Intriguing properties of neural networkss[ArXiv]

通过在$Ｌ_２$距离度量下Ｌ—BFGS解决优化问题来生成对抗样本。
　$min\ c|r|+loss_f(x+r,l)\ s.t\ x+r \subset[0,1]^m$
　L-BFGS attack and DeepFool attacks have been implemented in the Foolbox[code]

2017-Robust Physical-World Attacks on Deep Learning Models

Evasion attacks against machine learning at test time

adversary goal（攻击目标）: 分为两种目标，一是能够分类为好的即可（在垃圾邮件检测中，指分类为好的邮件）。如果有一个函数g定义在样本空间中X，g(x)<0表示分类为好，那么敌人的目标就是设计一个样本x，让g(x)<0。这种攻击很容易破解，只要调整分类界限即可（这里指g(x)=0这条边界）。所以，其实攻击者还有一个更好的目标是让g(x)尽可能小

Universal adversarial perturbations[ArXiv][code]

黑盒攻击

2017-10-WHITENING BLACK-BOX NEURAL NETWORKS[ArXiv]

窃取隐私数据然后进行白盒攻击

７、2016-Delving into Transferable Adversarial Examples and Black-box Attacks [ArXiv][code]

作者还使用了同样的实验方法测试了定向性对抗性攻击在目标模型的效果。结果表明定向类标记的传递性差了很多，只有小于等于4%的对抗性图像在源、目标机器学习模型中都识别出相同的定向标记。
基于此，作者提出了ensemble方法。它是以多个深度神经网络模型为基础构造对抗图片，即将图4中单个已知机器学习模型替换成多个不同的已知机器学习模型，并共同产生一个对抗性图像。
在实验设计中，作者对5个不同的深度神经网络模型一一实施了黑盒子攻击。在对每一个模型攻击的时候，作者假设已知其余的4个模型，并用集合的方式作白盒子对抗图形的构造。同样的，作者分别使用基于优化的攻击手段，和基于Fast Gradient的手段构造对抗性图片。构造图片依然使用的是Adam优化器。在算法经过100次的迭代对权重向量的更新，loss function得以汇聚。作者发现有许多攻击者预先设定的标记也得到了传递。详细结果参见表2。格子(i,j)代表用除了模型i之外的4个其他算法生成的对抗图片，用模型j来验证得到的定向标记的准确值。可以看出，当目标模型包含在已知模型集合中，定向类标记的传递性都在60%以上。即使目标模型不在已知模型集合中，定向标记的准确值也在30%以上。

８、2016-Practical Black-Box Attacks against Machine Learning[ArXiv][code]

敌手不需要知道目标模型的结构和参数，只能够根据输入得到输出。
由于对抗样本的迁移性，可以先构建替代模型训练对抗样本再在目标模型上测试生成的对抗样本。
替代模型的结构和目标模型结构要能够处理相同的任务。

Machine Learning as an Adversarial Service:Learning Black-Box Adversarial Examples

一种直接攻击目标黑盒神经网络的方法
MNIST数据集，CIFAR-10图片数据集
应用在物理世界中的目的模型——Amazon Machine Learning Prediction

Universal adversarial perturbations

Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN

利用生成对抗网络原理生成对抗样本的方法文中称之为MalGAN。其中，GAN 网络中的判别器是一个人为构建的替代探测器网络。生成器则是目的网络，能生成恶意软件对抗样本。

隐私问题

1、Hacking Smart Machines with Smarter Ones : How to Extract Meaningful Data from Machine Learning Classifiers[ArXiv]

２、Membership Inference Attacks Against Machine Learning Models[ArXiv]

3、Stealing machine learning models via prediction apis

4、An end-to-end case study of personal- ized warfarin dosing

On the Protection of Private Information in Machine Learning Systems:Two Recent Approaches

其他机器学习技术

1、Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks[ArXiv]

2、Tactics of Adversarial Attack on Deep Reinforcement Learning Agents[ArXiv]

3、Adversarial examples for generative models[ArXiv]

4、Adversarial Attacks on Neural Network Policies

防御技术

1、2016-Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks[ArXiv]

机器学习

1、2006-Can machine learning be secure?[paper]

2、2011-Adversarial machine learning

3、Take two software updates and see me in the morning: The Case for Software Security Evaluations of Medical Devices [paper]

4、Adversarial AI

5、2017-机器学习安全性问题及其防御技术研究综述

6、Concrete Problems in AI Safety[ArXiv]

7、Towards the Science of Security and Privacy in Machine Learning[ArXiv]

8、Stealing Machine Learning Models via Prediction APIs[paper][code]

GAN和对抗样本

1、2017-APE-GAN : Adversarial Perturbation Elimination with　GAN[ArXiv][code]

数据集：including MNIST, CIFAR10 and ImageNet

生成网络将对抗样本还原为真实图片

2、2017-Generative Adversarial Trainer : Defense to Adversarial Perturbations with GAN[ArXiv]

增强判别器的识别能力

３、2017-Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN[ArXiv]

GAN也可以产生针对恶意软件分类器的对抗样本
在提取的恶意软件特征上增加噪声使分类器误分类。

４、2017-Adversarial examples for generative models[ArXiv]

5、Generative Poisoning Attack Method Against Neural Networks[ArXiv]

提出了一种更有效的产生对抗样本的方法，它采用生成对抗网络（GAN）中生成模型（Generative Model）和判别模型（Discriminative model）的思想，即先用生成模型产生候选的对抗样本，再用判别模型进行筛选最优对抗样本。

博士论文

ON THE INTEGRITY OF DEEP LEARNING SYSTEMS IN ADVERSARIAL SETTINGS

Segmentation & Object Detection

Adversarial Examples for Semantic Segmentation and Object Detection

Visual Question Answering (VQA）

Can you fool AI with adversarial examples on a visual Turing test?

应用场景

１．Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition
２．Robust Physical-World Attacks on Machine Learning Models

软件及应用

1.AdversariaLib (includes implementation of evasion attacks from .
2.AdLib. A python library with a scikit-style interface which includes implementations of a number of published evasion attacks and defenses.
3.AlfaSVMLib. Adversarial Label Flip Attacks against Support Vector Machines.
4.Poisoning Attacks against Support Vector Machines, and Attacks against Clustering Algorithms
5.deep-pwning Metasploit for deep learning which currently has attacks on deep neural networks using Tensorflow
6.Cleverhans is a great TensorFlow-based library for adversarial attacks and defences: https://github.com/tensorflow/cleverhans
7.FoolBox – another collection of attacks that you can use with a framework of your preference
https://foolbox.readthedocs.io/en/latest/
8.EvadeML - Machine Learning in the Presence of Adversaries
9.Adversarial Machine Learning - PRA Lab