Hello everyone! I spent most of the week reading up on adversarial attacks and creating functions to produce them. One of the adversarial attack methods I replicated is the Fast Gradient Sign Method (FSGM) which takes the sign of an image’s gradient with respect to the loss function and multiplies that by some E which is a variable determined by the attacker. Higher E’s mean more change in the image. Below is an image from the Goodfellow et. al. 2014 paper, “Explaining and Harnessing Adversarial Examples”, which proposed the method. As you can see the method is used to generate a noise-like image that can be subtly overlayed on the original image to cause misclassification.
I used this method on the network I created last week. Here is one of the figures produced:
As you can see the attacked images have significantly lower confidence and are almost always misclassified. The new accuracy across the test set when the images were attacked was 23%, a large change from the original 99% accuracy.