X

# Week Ten: Saturated Autoencoders

Jun 04, 2022

Hello everyone! This last week I have worked on creating a custom loss function to create a saturated denoising autoencoder. This works by adding a Regularizer to the loss function. In a traditional neural network, variables called weights are initially set to random values, and over the course of training, these weights are moved incrementally according to the loss function. The loss function is a measure of how correct the network’s outputs are. It uses math to compare the network’s outputs with the input’s label. The lower the loss, the closer the predictions are to the correct output. Therefore the goal of training is to find the weights that minimize the loss.

A Regularizer is a function that is added to the loss to affect how the weights find their final values. One common example is L2 regularization which squares all the weights and adds them to the loss function. This encourages weights to be small numbers to minimize the loss. In the case of saturating a network, you want the weights to be larger values so that the output of each layer is either very high or very low. The goal of doing this is to make little perturbations less effective.

The Regularizer that I ended up using was based on Goroshin’s and LeCun’s 2013 paper, “Saturating Auto-Encoders”. I adapted their Regularizer to work for convolution layers as well. The final result looks like this:

Where α is a constant set by the user that determines how much the output of the regularizer should be weighted. Dh is the number of hidden layers. Fc(Y) is the saturating function to the activation function (basically Fc(x)=0 when f'(x) = 0 and otherwise is equal in slope but pointed towards the nearest section where f'(x) = 0). Yei is the output of layer i in the encoding part of an autoencoder.
When you take the derivative of this regularizer with respect to a single weight it looks like this:
Considering that f’c(x) = 0 when the output, y, is in the “saturated regions” of the activation function (i.e. where it has a slope of zero, meaning small changes don’t cause any change to the output) it is easy to minimize this regularizer. As soon the output is in the saturated region, f’c(x) = 0 which causes the whole expression to evaluate to 0. Thus the weight will only be pushed to change when it is not in the saturated region.
Next week I will show more of the results of this math in action, and I will begin writing my final paper.