Self-study of CS182 (Spring 2021) at UC Berkeley - Designing, Visualizing and Understanding Deep Neural Networks. Course homepage found here. This repo contains my solutions to the four homework assignments. Lecture videos can be found in this YouTube playlist.
I began self-study on this course on 2023-08-02 thanks to a comment in this Reddit discussion which described the course as "simply amazing." I agree.
Lectures 1-7.
Time spent: 20 hours.
- Implements a fully-connected neural network with arbitrarily many affine+ReLU hidden layers
- Implements convolutional neural networks
- Implements forward propagation and back propagations of the following layers:
- affine
- ReLU
- batch normalization
- spatial batch normalization
- dropout
- convolution
- max pool
- softmax loss
- SVM loss
- Implements momentum, RMSProp, Adam
- For initial setup, compiling the Cython extension required manually setting
language_level = "2"in thesetup.pyfile because I was running Cython 3.0.0, whereas the course was released using Cython 0.29 - I added a cell in
FullyConnectedNets.ipynbto visualize samples from the CIFAR10 dataset after dataload, always a useful step when working with image data. This revealed that each image is normalized by subtracting its mean. Visualizing required adding back the mean (by adding back the minimum) - Some fantastic resources I used to help my understanding:
Lectures 8-10.
Spring 2021 version not found, so this homework is forked from the Spring 2022 version.
Time spent: 15 hours.
- Implements forward propagation and back propagations for:
- word embedding
- vanilla RNN single step
- LSTM single step
- RNN/LSTM train on entire sequence
- RNN/LSTM inference for caption generation
- Implements neural network visualization techniques
- saliency maps
- DeepDream
- class visualization from random noise
- Implements style transfer
- Gram matrix
- content loss
- style loss
- total variation loss
- Completed Intro to PyTorch notebook from the Spring 2022 version of Homework 1
- Additional resources:
- Math for RNNs helped understand that in the notation y = W[hidden; input] that the layers gets summed after multiplication with their respective weight matrices, not concatenated and multiplied by a single weight matrix as aluded to in lecture.
Lectures 11-13.
Time spent: 10 hours.
- Implements LSTM-based language model in PyTorch
- Implements initialization, forward pass, loss, optimization
- Used for:
- evaluating sentence likelihood
- generating next token
- Implements a Transformer language model in PyTorch
- Implements each element of the Attention Is All You Need paper
- Used for:
- article summarization
- inspecting word relationships in the embedding space
- Additional resources:
- A TON of PyTorch documentation.
- Andrej Karpathy's Recipe for Training Neural Networks based on common mistakes
- Why my loss was decreasing to nan