Deep Learning
Deep learning (DL) is a specialised subfield of machine learning that employs neural networks to represent and approximate complex, non-linear relationships in data. DL methods have achieved leading empirical performance on a range of tasks, including pattern recognition, natural language processing, and data generation, and have contributed substantially to recent methodological advances.
Introduction
An introductory course on deep learning is offered by Stanford University and encompasses an introduction to the fundamentals of deep learning, as well as examples of more sophisticated topics, including deep reinforcement learning.
# images # Stanford lecture # self-paced # Introduction to Deep Learning (Stanford CS231N) # 2017 # videos
Another introductory course is offered by TUM, providing an overview of deep learning along with a detailed look at model training, optimization methods, and the use of non-linear layers in neural networks.
# images # text # audio # TUM lecture # I2DL # 2025 # slides
Methods
Deep learning methods are built around the concept of artificial neural networks, which are trained using optimization techniques such as gradient descent. Core components include loss functions, which measure the difference between predictions and intended outcomes, and optimization algorithms, such as stochastic gradient descent (SGD) and Adam, which update network parameters. Regularization techniques, including dropout and weight decay, help prevent overfitting, while training strategies, such as batch normalization and learning rate schedules, support more efficient and robust learning.
Hyperparameter tuning is an essential part of training deep learning models efficiently. For practical guidance, see:
# self-paced # Tune: Scalable Hyperparameter Tuning # 2025 # slides
Understanding and applying effective training strategies, including learning rate schedules and batch normalization, is crucial for model performance. For hands-on examples using PyTorch, see:
# self-paced # Guide to Pytorch Learning Rate Scheduling # 2020 # jupyter notebook
For a broader overview of deep learning in Python, including an introduction to PyTorch with tutorials and practical exercises, see the Deep Learning with Python coding section.
Architectures
Deep learning architectures define how neural networks are structured to process different types of data. Convolutional Neural Networks (CNNs) are specialized for spatial data such as images, while Recurrent Neural Networks (RNNs) handle sequential data, including text and time series. Transformers use self-attention mechanisms to model long-range dependencies in sequences and represent the current state-of-the-art for many NLP and sequence tasks. Autoencoders are unsupervised models commonly used for dimensionality reduction, learning meaningful representations, and detecting anomalies.
This application provides a technical demo of a CNN using the MNIST dataset, a widely used benchmark of handwritten digits. Users can draw their own digits and observe how the network processes them. The source code and related publication are also available:
# images # self-paced # An Interactive Node-Link Visualization of Convolutional Neural Networks # 2015 # paper # slides # videos
The following lecture-style video covers the Transformer architecture and its applications and is best for those familiar with neural networks and embeddings.
# images # text # audio # self-paced # Introduction to the Transformer Architecture # 2022 # slides # videos
More on Transformers can be found in the NLP chapter.
Model Classes
Beyond general architectures, deep learning includes several specialized model classes that address specific goals. Generative Adversarial Networks (GANs) learn to generate new data by creating a competition between a generator and a discriminator. Variational Autoencoders (VAEs) extend standard autoencoders probabilistically, modeling a latent variable distribution and optimizing a variational lower bound to enable generative modeling. Recently, diffusion models have gained popularity as generative models that iteratively refine noisy data into structured outputs.
For a deeper dive into diffusion models, see:
# images # self-paced # What are Diffusion Models? # 2025 # page
Laplace Redux provides a practical library for applying Laplace approximations in neural networks, whether for entire networks, subnetworks, or just the last layer. The package enables posterior approximations, marginal-likelihood estimation, and posterior predictive computations, and includes multiple example scenarios. Implementing Laplace approximations from scratch is difficult due to the Hessian computations, so this library offers a straightforward way to experiment with these techniques in code.
# self-paced # Laplace Redux - Effortless Bayesian Deep Learning # 2025 # slides