COMPSCI 682 Neural Networks: A Modern Introduction

Acknowlegements

These notes originally accompany the Stanford CS class CS231n, and are now provided here for the UMass class COMPSCI 682 with minor changes reflecting our course contents. Many thanks to Fei-Fei Li and Andrej Karpathy for graciously letting us use their course materials!

Tips

We encourage the use of the hypothes.is extension to annotate comments and discuss these notes inline.
Module 0: Preparation
Module 1: Neural Networks
Linear classification: Support Vector Machine, Softmax
parameteric approach, bias trick, hinge loss, cross-entropy loss, L2 regularization, web demo
Optimization: Stochastic Gradient Descent
optimization landscapes, local search, learning rate, analytic/numerical gradient
Backpropagation, Intuitions
chain rule interpretation, real-valued circuits, patterns in gradient flow
Neural Networks Part 1: Setting up the Architecture
model of a biological neuron, activation functions, neural net architecture, representational power
Neural Networks Part 2: Setting up the Data and the Loss
preprocessing, weight initialization, batch normalization, regularization (L2/dropout), loss functions
Neural Networks Part 3: Learning and Evaluation
gradient checks, sanity checks, babysitting the learning process, momentum (+nesterov), second-order methods, Adagrad/RMSprop, hyperparameter optimization, model ensembles
Module 2: Convolutional Neural Networks
Convolutional Neural Networks: Architectures, Convolution / Pooling Layers
layers, spatial arrangement, layer patterns, layer sizing patterns, AlexNet/ZFNet/VGGNet case studies, computational considerations
Understanding and Visualizing Convolutional Neural Networks
tSNE embeddings, deconvnets, data gradients, fooling ConvNets, human comparisons