ReinforcementLearning
Markov Decision Processes and the Foundations of RL
These notes introduce the foundational concepts of Reinforcement Learning (RL), a subfield of machine learning where agents learn to interact with an environment to maximize cumulative rewards. The content is based on the teaching material of Lorenzo Fiaschi for the Symbolic and Evolutionary Artificial Intelligence course at the University of Pisa (a.
Function Approximators in Reinforcement Learning
These notes explore the principles and methods of function approximation in Reinforcement Learning (RL), focusing on differentiable approaches for value and policy estimation. The content is based on the teaching material of Lorenzo Fiaschi for the Symbolic and Evolutionary Artificial Intelligence course at the University of Pisa (a.
Model-Based Reinforcement Learning and Exploration Strategies
Model-Based Reinforcement Learning (MBRL) introduces a paradigm in which the agent first learns an internal model of the environment from experience, and then uses this model to derive value functions, optimal policies, or action-value functions through planning. Model learning itself is formulated as a supervised learning problem, and the estimation of model uncertainty provides an additional perspective for reasoning about decision-making.
Proximal Policy Optimization and Advanced Actor–Critic Variants
The concept of trust policy regions aims to limit excessive variation in the policy between updates, thus improving stability and avoiding performance collapse. While using the KL divergence as a strict constraint achieves this goal, it can be computationally demanding, with the notable exception of the Natural Policy Gradient method.
Reinforcement Learning - Model Free Methods
In many real-world scenarios, it is impractical or impossible to assume full knowledge of the environment’s dynamics, such as the transition probabilities or the reward function. For this reason, model-free reinforcement learning techniques have been developed. These approaches allow an agent to learn from direct interaction with the environment, without requiring an explicit model.