Week Two: Generating Adversarial Attacks

Apr 15, 2022

Hello everyone! I spent most of the week reading up on adversarial attacks and creating functions to produce them. One of the adversarial attack methods I replicated is the Fast Gradient Sign Method (FSGM) which takes the sign of an image’s gradient with respect to the loss function and multiplies that by some E which is a variable determined by the attacker. Higher E’s mean more change in the image. Below is an image from the Goodfellow et. al. 2014 paper, “Explaining and Harnessing Adversarial Examples”, which proposed the method. As you can see the method is used to generate a noise-like image that can be subtly overlayed on the original image to cause misclassification.

I used this method on the network I created last week. Here is one of the figures produced:

As you can see the attacked images have significantly lower confidence and are almost always misclassified. The new accuracy across the test set when the images were attacked was 23%, a large change from the original 99% accuracy.

2 Replies to “Week Two: Generating Adversarial Attacks”

  1. Danny D. says:

    This is a really cool concept Szymon, I’d be curious as to if allocating more processing power and computing cores would yield different confidence scores as it could run more iterations of the function (as in generating different grain amounts and overlaying different patterns across the images)

    1. Szymon S. says:

      If you did more iterations during the training of the model the result would likely be overfitting as the model would “memorize” the training data rather than understand the patterns behind its classification, so I am not sure that would be beneficial. However, it is interesting that you mention doing multiple iterations of the function. This concept is the basis of a lot of higher power attacks such as the Deep Fool attacks. They take the basic concept of FSGM and refine it through iteration.

Leave a Reply