Deep Learning

Deep learning (DL) is a specialised subfield of machine learning that employs neural networks to represent and approximate complex, non-linear relationships in data. DL methods have achieved leading empirical performance on a range of tasks, including pattern recognition, natural language processing, and data generation, and have contributed substantially to recent methodological advances.

Introduction

An introductory course on deep learning is offered by Stanford University and encompasses an introduction to the fundamentals of deep learning, as well as examples of more sophisticated topics, including deep reinforcement learning.

Deep Learning - Stanford CS231N - YouTube Playlist

Further Keywords

convolutional neural networks (CNN), image classification, loss functions, optimization, training, software, recurrent neural networks (RNN), detection and segmentation, computer vision, generative models, hardware, adverserial training

# images # Stanford lecture # self-paced # Introduction to Deep Learning (Stanford CS231N) # 2017 # videos

Another introductory course is offered by TUM, providing an overview of deep learning along with a detailed look at model training, optimization methods, and the use of non-linear layers in neural networks.

Introduction to Deep Learning (I2DL) - Slides

Further Keywords

ML basics, linear regression, maximum likelihood, neural networks, computational graphs, optimization, backpropagation, scaling optimization, stochastic gradient descent (SGD), training, CNNs, RNN, transformers, advanced DL

# images # text # audio # TUM lecture # I2DL # 2025 # slides

Methods

Deep learning methods are built around the concept of artificial neural networks, which are trained using optimization techniques such as gradient descent. Core components include loss functions, which measure the difference between predictions and intended outcomes, and optimization algorithms, such as stochastic gradient descent (SGD) and Adam, which update network parameters. Regularization techniques, including dropout and weight decay, help prevent overfitting, while training strategies, such as batch normalization and learning rate schedules, support more efficient and robust learning.

Hyperparameter tuning is an essential part of training deep learning models efficiently. For practical guidance, see:

Further Keywords

ray tune, search space, search algorithm, scheduler, tuner, trials, function API, class API, random search, grid search, Bayesian optimization, Bandit optimization, tree-parzen estimators, gradient-free optimization, optuna search algorithms, median stopping rule, ASHA, Population Based Training (PBT)

# self-paced # Tune: Scalable Hyperparameter Tuning # 2025 # slides

Understanding and applying effective training strategies, including learning rate schedules and batch normalization, is crucial for model performance. For hands-on examples using PyTorch, see:

Guide to Pytorch Learning Rate Scheduling

Further Keywords

lambda, multiplicative, step, MultiStep, exponential, cosine annealing, cyclic, OneCycle, cosine annealing with warm restarts

# self-paced # Guide to Pytorch Learning Rate Scheduling # 2020 # jupyter notebook

For a broader overview of deep learning in Python, including an introduction to PyTorch with tutorials and practical exercises, see the Deep Learning with Python coding section.

Architectures

Deep learning architectures define how neural networks are structured to process different types of data. Convolutional Neural Networks (CNNs) are specialized for spatial data such as images, while Recurrent Neural Networks (RNNs) handle sequential data, including text and time series. Transformers use self-attention mechanisms to model long-range dependencies in sequences and represent the current state-of-the-art for many NLP and sequence tasks. Autoencoders are unsupervised models commonly used for dimensionality reduction, learning meaningful representations, and detecting anomalies.

This application provides a technical demo of a CNN using the MNIST dataset, a widely used benchmark of handwritten digits. Users can draw their own digits and observe how the network processes them. The source code and related publication are also available:

An Interactive Node-Link Visualization of Convolutional Neural Networks

Further Keywords

CNN, MNIST, visualization, filters, feature maps, 3D fully-connected network, 2D, nodes, bottom layer, hidden layer, output layer, convolutional layer, flattening

# images # self-paced # An Interactive Node-Link Visualization of Convolutional Neural Networks # 2015 # paper # slides # videos

The following lecture-style video covers the Transformer architecture and its applications and is best for those familiar with neural networks and embeddings.

Introduction to the Transformer Architecture - YouTube Video

Further Keywords

computer vision, NLP, reinforcement learning, speech, translation, graphs, attention, tokenization, embeddings, positional encoding, multi-headed self-attention, point-wise MLP, GeLU, layer normalization, encoder, decoder, masked self-attention, generation, cross attention, feedforward, compute budget heuristics, ExaFLOPs, GPU, mixture of experts, ViT, convolution-augmented transformer, decision transformer

# images # text # audio # self-paced # Introduction to the Transformer Architecture # 2022 # slides # videos

More on Transformers can be found in the NLP chapter.

Model Classes

Beyond general architectures, deep learning includes several specialized model classes that address specific goals. Generative Adversarial Networks (GANs) learn to generate new data by creating a competition between a generator and a discriminator. Variational Autoencoders (VAEs) extend standard autoencoders probabilistically, modeling a latent variable distribution and optimizing a variational lower bound to enable generative modeling. Recently, diffusion models have gained popularity as generative models that iteratively refine noisy data into structured outputs.

For a deeper dive into diffusion models, see:

What are Diffusion Models?

Further Keywords

Markov chain, random noise, latent variable, Gaussian noise, forward process, Langevin dynamics, reverse process, Bayes’ rule, parametrization, training loss, noise-conditioned score networks (NCSN), classifier guided diffusion, classifier-free guidance, sampling steps, progressive distillation, DDPM, DDIM, consistency models, latent variable space, CLIP, U-Net, Transformer, ControlNet

# images # self-paced # What are Diffusion Models? # 2025 # page

Laplace Redux provides a practical library for applying Laplace approximations in neural networks, whether for entire networks, subnetworks, or just the last layer. The package enables posterior approximations, marginal-likelihood estimation, and posterior predictive computations, and includes multiple example scenarios. Implementing Laplace approximations from scratch is difficult due to the Hessian computations, so this library offers a straightforward way to experiment with these techniques in code.

Laplace Redux - Effortless Bayesian Deep Learning

Further Keywords

torch, marginal likelihood, laplace on LLMs, serialization, backend, regression, calibration, GP inference, huggingface LLMs, reward modeling, API

# self-paced # Laplace Redux - Effortless Bayesian Deep Learning # 2025 # slides