Machine Learning

Machine Learning (ML) focuses on developing algorithms and models that allow computers to learn patterns from data and make predictions or decisions without being explicitly programmed. It spans a wide range of techniques, from traditional statistical methods to modern deep learning approaches, and is applied in domains such as natural language processing (NLP) and computer vision (CV). ML enables the extraction of insights from large and complex datasets, supporting data-driven decision-making across disciplines.

Learning Paradigms

Machine learning can be organized into a variety of learning paradigms, which describe different ways in which models interact with data and potential feedback.

In Supervised Learning, the model is trained on a labeled dataset, where traditionally each input data point is associated with a corresponding output label. The goal is to learn a mapping from inputs to outputs, enabling the model to make accurate predictions on new, unseen data. Common supervised learning tasks include classification (e.g., spam detection) and regression (e.g., predicting house prices). For a discussion of the assumption of a single ground truth label in the context of NLP, see the section on label variation.

Unsupervised Learning involves training models on unlabeled data, where the goal is to discover underlying patterns, structures, or relationships within the data. Common unsupervised learning tasks include clustering (e.g., customer segmentation) and dimensionality reduction (e.g., principal component analysis).

An open and free introductory course on (supervised) machine learning can be found on the I2ML Course Website from the Statistical Learning and Data Science group at LMU Munich. The course is constructed as self-contained as possible and enables self-study through lecture videos, PDF slides, cheatsheets, quizzes, exercises (with solutions), and notebooks.

Further Keywords

classification, k-NN, trees, random forests, bagging, neural networks, hyperparameter tuning, train-validation-test-split, advanced risk minimization, multiclass classification, information theory, curse of dimensionality, regularization, SVM, boosting, Gaussian Processes, imbalanced learning, multitarget learning, online learning, feature selection, sklearn, mlr3

# tabular # text # LMU lecture # self-paced # I2ML # 2022 # slides # jupyter notebook # videos

In addition to supervised and unsupervised approaches, Self-supervised Learning leverages automatically generated labels from the data itself, allowing models to learn useful representations without requiring costly manual annotation.

Further Keywords

auto-regressive, flow-based, auto-encoding, hybrid generative, mutual information, adverserial, GAN, data augmentation, pretext task, BYOL, SimCLR, MoCo, graph learning

# images # text # graphs # self-paced # Self-supervised Learning: Generative or Contrastive # 2021 # paper

Reinforcement Learning

Reinforcement Learning is a paradigm where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties based on its actions, and the goal is to learn a policy that maximizes cumulative rewards over time. Reinforcement learning is commonly used in applications such as game playing (e.g., AlphaGo) and robotics.

Further Keywords

adaptive experimental designs, bandits, multi-armed, exploration vs. exploitation, regret, reward, stylized data structure, greedy policy, epsilon-greedy, upper confidence bound, uncertainty, Thompson sampling, Bayesian learning, inference with batched bandits

# tabular # BERD Academy module # Reinforcement Learning for Business, Economics, and Social Sciences # 2025 # slides

Active Learning

Active Learning (AL) is a machine learning approach that aims to maximize model performance while minimizing the amount of labeled data required. Instead of labeling an entire dataset upfront, the algorithm iteratively identifies the most informative or uncertain examples and queries an expert for labels. This strategy is especially useful in domains where labeling is expensive, time-consuming, or requires specialized knowledge, such as medical diagnosis or linguistic annotation. By focusing effort on the most valuable data points, active learning can significantly improve efficiency and accelerate model training without sacrificing accuracy.

Automated Machine Learning

Automated Machine Learning (AutoML) refers to the process of automating the end-to-end workflow of applying machine learning to real-world problems. This includes tasks such as data preprocessing, feature selection, model selection, hyperparameter tuning, and model evaluation. The goal of AutoML is to make machine learning more accessible to non-experts and to improve the efficiency and effectiveness of the model development process.