Lectures

Lecture 1: An Introduction to Deep Reinforcement Learning

Reinforcement Learning (RL) is a subfield of machine learning that is
concerned with training agents to take decisions such that a
cumulative reward signal is maximized in an environment. It provides a
highly general and elegant toolbox for building intelligent systems
that learn through interacting with their environment rather than
supervision. In the past few years RL (in combination with deep neural
networks) has shown impressive results on a variety of challenging
domains, from games to robotics. It is also seen by some as a possible
path towards general human-level intelligent systems. I will explain
some of the basic algorithms that the field is based on (Q Learning,
Policy Gradients), as well as a few extensions to these algorithms
that are used in practice (PPO, IMPALA, and others).

Lecture 2: Milestones in Large-scale Reinforcement Learning: AlphaZero, OpenAI Five and AlphaStar

Over the past few years, we have seen a number of successes in Deep
Reinforcement Learning: Among other results, RL agents have been able
to match or exceed the strength of the best human players at the games
of Go, Dota II and StarCraft II. These were achieved by AlphaZero,
OpenAI Five and AlphaStar, respectively. I will go into details of how
these three systems work, highlighting similarities and differences.
What are the lessons we can draw from these results, and what is
missing to apply Deep RL to challenging real world problems?

Lecture 3 – Tutorial: JAX, A new library for building neural networks

Tutorial: JAX, A new library for building neural networks

JAX is a new framework for deep learning developed at Google AI.
Written by the authors of the popular autograd library, it is built on
the concept of function transformations: Higher-order functions like
‘grad’, ‘vmap’, ‘jit’, ‘pmap’ and others are powerful tools that allow
researchers to express ML algorithms succinctly and correctly, while
making full use of hardware resources like GPUs and TPUs. Most
importantly, solving problems with JAX is fun! I will give a short
introduction to JAX, covering the most important function
transformations, and demonstrating how to apply JAX to several ML
problems.

Igor Babuschkin

Lecture 1: Autoencoders and Deep Learning

TBA

Lecture 2: Deep Learning in the Physical Sciences

Abstract TBA

Lecture 3: Deep Learning in the Life Sciences

TBA

Pierre Baldi

Lecture 1: Introduction to the Value of Information Theory

Data analysis allows us to create models of and extract information
about various phenomena (e.g. recognition, classification, prediction of
some objects or events). What is the value of this information?
Ideally, information should improve the quality of decisions, which is
manifested in a reduction of errors or an increase of expected utility.
The maximum possible `improvement’ represents the value of information.
The corresponding mathematical theory originated in the works of Claude
Shannon on rate distortion, and later was developed in the 1960s by
Ruslan Stratonovich and his colleagues. In this lecture, I will remind
the solution of the maximum entropy problem with a linear constraint.
This will outline the main mathematical ideas leading to other
variational problems with constraints on different types of information
(namely, of Hartley, Boltzmann and Shannon’s types). The optimal values
of these problems are the values of different types of information. The
value of Shannon’s information is particularly interesting, because in
many cases it has a nice analytical solution, and it provides
theoretical upper frontier for the values of all types of information.
Geometric analysis of solutions to problems on the values of information
of different types gives interesting insights about the role of
randomization in various learning and optimization algorithms.

Lecture 2: Applications of the Value of Information: Graphs, Evolutionary and Learning Algorithms

I will show that the power-law graphs are solutions to the maximum entropy problem, and that the preferential attachment procedure generating such graphs can be derived as solution to the dual problem on path length minimization with constraints on information, which is the value of information problem. Another application is optimal control of mutation rates in evolutionary algorithms. One interesting solution to this problem can be obtained by maximizing expected fitness of a population subject to constraints on information divergence. In the end I will discuss how optimal learning can be defined as dynamical generalization of the value of information problem.

Lecture 3: Tutorial on Quantum Probability

The interest in quantum computing has resulted in greater penetration of the ideas from quantum physics into broader areas of science, which include not only computer science, but also social sciences and even psychology. Yet, many results and notation of quantum physics and quantum information remain alien and difficult for understanding. This talk is intended for those familiar with basic ideas of the classical probability theory, and it will summarize the main facts about its non-commutative (i.e. quantum) generalization. If time allows, I will also mention some curious facts about the quantum-classical interface.

Roman Belavkin

Lecture: Geometric deep learning: history, successes, promises, and challenges

In the past decade, deep learning methods have achieved unprecedented performance on a broad range of problems in various fields from computer vision to speech recognition. So far research has mainly focused on developing deep learning methods for Euclidean-structured data. However, many important applications have to deal with non-Euclidean structured data, such as graphs and manifolds. Such data are becoming increasingly important in computer graphics and 3D vision, sensor networks, drug design, biomedicine, high energy physics, recommendation systems, and social media analysis. The adoption of deep learning in these fields has been lagging behind until recently, primarily since the non-Euclidean nature of objects dealt with makes the very definition of basic operations used in deep networks rather elusive. In this talk, I will introduce the emerging field of geometric deep learning on graphs and manifolds, overview existing solutions and outline the key difficulties and future research directions. As examples of applications, I will show problems from the domains of computer vision, graphics, medical imaging, and protein science.

Tutorial 1: From grids to graphs

Fundamental challenges of building deep learning architectures for non-Euclidean structured data * Signal processing perspective * Convolutions in spectral and spatial domains * Basic recipes how to build graph neural networks.

Tutorial 2: Theory and practice

How powerful are graph neural networks? * Deep graph neural networks * Scalability * Dynamic graphs * Open questions, trends, and challenges.

Tutorial 3: Manifolds, meshes, and point clouds

Basics of differential geometry * Convolutions on meshes * Generative models * Point clouds * Latent graphs.

Michael Bronstein

Lecture 1

TBA

Lecture 2

TBA

Sergiy Butenko

Finding Deeply Hidden Truths: Breakthroughs in Nonlinear Identifiability Theory

Is principled disentanglement possible? Equivalently, do nonlinear models converge to a unique set of representations when given sufficient data and compute? We provide an introduction to this problem, and present some surprising recent theoretical results that answer this question in the affirmative. In particular, we show that the representations learned by a very broad family of neural networks are identifiable up to only a trivial transformation. The family of models for which we derive strong identifiability results includes a large fraction of models in use today, including supervised models, self-supervised models, flow-based models, VAEs and energy-based models.