Each Lecturer will hold three lessons on a specific topic.

The Lecturers below are confirmed.

#### Biography

Igor Babuschkin studied physics at Technische Universität Dortmund, where he specialized in the study of B mesons, working with the LHCb experiment at the Large Hadron Collider in Geneva, Switzerland. He gradually became more interested in machine learning and joined DeepMind as a Research Engineer in 2017, where he was involved in several projects, including WaveNet, a neural text-to-speech engine. He lead the engineering on AlphaStar, a deep reinforcement learning agent which is able to match the strength of Grandmaster level players at the game StarCraft II.

#### Lectures

Reinforcement Learning (RL) is a subfield of machine learning that is

concerned with training agents to take decisions such that a

cumulative reward signal is maximized in an environment. It provides a

highly general and elegant toolbox for building intelligent systems

that learn through interacting with their environment rather than

supervision. In the past few years RL (in combination with deep neural

networks) has shown impressive results on a variety of challenging

domains, from games to robotics. It is also seen by some as a possible

path towards general human-level intelligent systems. I will explain

some of the basic algorithms that the field is based on (Q Learning,

Policy Gradients), as well as a few extensions to these algorithms

that are used in practice (PPO, IMPALA, and others).

Over the past few years, we have seen a number of successes in Deep

Reinforcement Learning: Among other results, RL agents have been able

to match or exceed the strength of the best human players at the games

of Go, Dota II and StarCraft II. These were achieved by AlphaZero,

OpenAI Five and AlphaStar, respectively. I will go into details of how

these three systems work, highlighting similarities and differences.

What are the lessons we can draw from these results, and what is

missing to apply Deep RL to challenging real world problems?

Tutorial: JAX, A new library for building neural networks

JAX is a new framework for deep learning developed at Google AI.

Written by the authors of the popular autograd library, it is built on

the concept of function transformations: Higher-order functions like

‘grad’, ‘vmap’, ‘jit’, ‘pmap’ and others are powerful tools that allow

researchers to express ML algorithms succinctly and correctly, while

making full use of hardware resources like GPUs and TPUs. Most

importantly, solving problems with JAX is fun! I will give a short

introduction to JAX, covering the most important function

transformations, and demonstrating how to apply JAX to several ML

problems.

#### Lectures

Abstract TBA

#### Biography

Roman Belavkin is a Reader in Informatics at the Department of Computer Science, Middlesex University, UK. He has MSc degree in Physics from the Moscow State University and PhD in Computer Science from the University of Nottingham, UK. In his PhD thesis, Roman combined cognitive science and information theory to study the role of emotion in decision-making, learning and problem solving. His main research interests are in mathematical theory of dynamics of information and optimization of learning, adaptive and evolving systems. He used information value theory to give novel explanations of some common decision-making paradoxes. His work on optimal transition kernels showed non-existence of optimal deterministic strategies in a broad class of problems with information constraints.

Roman’s theoretical work on optimal parameter control in algorithms has found applications to computer science and biology. From 2009, Roman lead a collaboration between four UK universities involving mathematics, computer science and experimental biology on optimal mutation rate control, which lead to the discovery in 2014 of mutation rate control in bacteria (reported in Nature Communications http://doi.org/skb and PLOS Biology http://doi.org/cb9s). He also contributed to research projects on neural cell-assemblies, independent component analysis and anomaly detection, such as cyber attacks.

#### Lectures

Data analysis allows us to create models of and extract information

about various phenomena (e.g. recognition, classification, prediction of

some objects or events). What is the value of this information?

Ideally, information should improve the quality of decisions, which is

manifested in a reduction of errors or an increase of expected utility.

The maximum possible `improvement’ represents the value of information.

The corresponding mathematical theory originated in the works of Claude

Shannon on rate distortion, and later was developed in the 1960s by

Ruslan Stratonovich and his colleagues. In this lecture, I will remind

the solution of the maximum entropy problem with a linear constraint.

This will outline the main mathematical ideas leading to other

variational problems with constraints on different types of information

(namely, of Hartley, Boltzmann and Shannon’s types). The optimal values

of these problems are the values of different types of information. The

value of Shannon’s information is particularly interesting, because in

many cases it has a nice analytical solution, and it provides

theoretical upper frontier for the values of all types of information.

Geometric analysis of solutions to problems on the values of information

of different types gives interesting insights about the role of

randomization in various learning and optimization algorithms.

I will show that the power-law graphs are solutions to the maximum entropy problem, and that the preferential attachment procedure generating such graphs can be derived as solution to the dual problem on path length minimization with constraints on information, which is the value of information problem. Another application is optimal control of mutation rates in evolutionary algorithms. One interesting solution to this problem can be obtained by maximizing expected fitness of a population subject to constraints on information divergence. In the end I will discuss how optimal learning can be defined as dynamical generalization of the value of information problem.

The interest in quantum computing has resulted in greater penetration of the ideas from quantum physics into broader areas of science, which include not only computer science, but also social sciences and even psychology. Yet, many results and notation of quantum physics and quantum information remain alien and difficult for understanding. This talk is intended for those familiar with basic ideas of the classical probability theory, and it will summarize the main facts about its non-commutative (i.e. quantum) generalization. If time allows, I will also mention some curious facts about the quantum-classical interface.

#### Lectures

In the past decade, deep learning methods have achieved unprecedented performance on a broad range of problems in various fields from computer vision to speech recognition. So far research has mainly focused on developing deep learning methods for Euclidean-structured data. However, many important applications have to deal with non-Euclidean structured data, such as graphs and manifolds. Such data are becoming increasingly important in computer graphics and 3D vision, sensor networks, drug design, biomedicine, high energy physics, recommendation systems, and social media analysis. The adoption of deep learning in these fields has been lagging behind until recently, primarily since the non-Euclidean nature of objects dealt with makes the very definition of basic operations used in deep networks rather elusive. In this talk, I will introduce the emerging field of geometric deep learning on graphs and manifolds, overview existing solutions and outline the key difficulties and future research directions. As examples of applications, I will show problems from the domains of computer vision, graphics, medical imaging, and protein science.

Fundamental challenges of building deep learning architectures for non-Euclidean structured data * Signal processing perspective * Convolutions in spectral and spatial domains * Basic recipes how to build graph neural networks.

#### Biography

Dr. Butenko’s research concentrates mainly on global and discrete optimization and their applications. In particular, he is interested in theoretical and computational aspects of continuous global optimization approaches for solving discrete optimization problems on graphs. Applications of interest include network-based data mining, analysis of biological and social networks, wireless ad hoc and sensor networks, energy, and sports analytics.

#### Lectures

#### Topics

Constraint-Based Approaches to Machine Learning#### Biography

Marco Gori received the Ph.D. degree in 1990 from Università di Bologna, Italy, while working partly as a visiting student at the School of Computer Science, McGill University – Montréal. In 1992, he became an associate professor of Computer Science at Università di Firenze and, in November 1995, he joint the Università di Siena, where he is currently full professor of computer science. His main interests are in machine learning, computer vision, and natural language processing. He was the leader of the WebCrow project supported by Google for automatic solving of crosswords, that outperformed human competitors in an official competition within the ECAI-06 conference. He has just published the book “Machine Learning: A Constrained-Based Approach,” where you can find his view on the field.

He has been an Associated Editor of a number of journals in his area of expertise, including The IEEE Transactions on Neural Networks and Neural Networks, and he has been the Chairman of the Italian Chapter of the IEEE Computational Intelligence Society and the President of the Italian Association for Artificial Intelligence. He is a fellow of the ECCAI (EurAI) (European Coordinating Committee for Artificial Intelligence), a fellow of the IEEE, and of IAPR. He is in the list of top Italian scientists kept by VIA-Academy.

#### Lectures

#### Topics

Adam algorithm, generative models, variational (Bayesian) inference & stochastic optimization#### Biography

- Current (2018 – …): Senior Research Scientist at Google Brain (San Francisco); generative models, identifiability, among other topics.
- 2015-2018: Research Scientist at OpenAI (San Francisco). Part of the founding team of OpenAI and lead of the Algorithms team, focused on basic research.
- 2013-2017: Ph.D. (cum laude) at University of Amsterdam. Thesis: Variational Inference and Deep Learning: A New Synthesis.

AWARDS

- 2019: The Gerrit van Dijk prijs from the Royal Holland Society of Sciences and Humanities, for my work in machine learning.
- 2019: The ELLIS PhD Award for “outstanding research achievements during the dissertation phase of outstanding students working in the field of artificial intelligence and machine learning”.
- 2017: PhD with ‘cum laude’, highest distinction in the Netherlands, and first time it was awarded at the CS department in 30 years.
- 2015: Google’s first European Doctoral Fellowship in Deep Learning

#### Lectures

Is principled disentanglement possible? Equivalently, do nonlinear models converge to a unique set of representations when given sufficient data and compute? We provide an introduction to this problem, and present some surprising recent theoretical results that answer this question in the affirmative. In particular, we show that the representations learned by a very broad family of neural networks are identifiable up to only a trivial transformation. The family of models for which we derive strong identifiability results includes a large fraction of models in use today, including supervised models, self-supervised models, flow-based models, VAEs and energy-based models.

#### Biography

**Risto Miikkulainen** is a Finnish-American computer scientist and professor at the University of Texas at Austin. In 2016, he was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) “for contributions to techniques and applications for neural and evolutionary computation”.

Risto Miikkulainen is AVP of Evolutionary Intelligence at Cognizant Technology Solutions. His current research focuses on methods and applications of neuroevolution, as well as neural network models of natural language processing and vision; he is an author of over 400 articles in these research areas.

#### Honors and Awards

- IEEE CIS Evolutionary Computation Pioneer Award, 2020
- Gabor Award, the International Neural Network Society, 2017
- Outstanding Paper of the Decade Award, International Society for Artificial Life, 2017.
- IEEE Fellow, 2016
- IEEE Computational Intelligence Society Distinguished Lecturer, 2015-2017.
- Deployed Application Award, AAAI/IAAI-2013, AAAI/IAAI-2018
- Best Paper Awards at GECCO-2002, 2003, 2005, 2007, 2008, 2014, 2015, 2017
- Best Paper Awards at CIG-2005, 2006, 2009, 2011
- BotPrize Award (Turing test for game bots), 2012
- Honorable mention, Ziskind-Somerfield Research Award, Society of Biological Psychiatry, 2012
- Winner, Annual Competition of Pseudo-Boolean SAT Solvers at SAT-2010 and SAT-2011
- Bronze Medal, Human Competitive Results Competition, GECCO-2005, GECCO-2017

His research focuses on biologically-inspired computation such as neural networks and evolutionary computation. On one hand, the goal is to understand biological information processing, and on the other, to develop intelligent artificial systems that learn and adapt by observing and interacting with the environment. The three main focus areas are: (1) Neuroevolution, i.e. evolving complex deep learning architectures and recurrent neural networks for sequential decision tasks such as those in robotics, games, and artificial life; (2) Cognitive Science, i.e. models of natural language processing, memory, and learning that, in particular, shed light on disorders such as schizophrenia and aphasia; and (3) Computational Neuroscience, i.e. development, structure, and function of the visual cortex, episodic memory, and language processing.

See the UTCS Neural Networks Research Group website for research projects, publications, demos, and software. A few highlights: TexasExes interview/skit on artificial evolution; O’Reilly Radar Podcast on evolutionary computation; Digital Nibbles interview on BotPrize (i.e. Turing test for game bots); a 2-min soundbite on neuroevolution; the NERO machine learning game; an interactive demo of schizophrenic language model; the *Computational Maps in the Visual Cortex* book.

#### Lectures

Abstract TBA

#### Biography

Jose C. Principe (M’83-SM’90-F’00) is a Distinguished Professor of Electrical and Computer Engineering and Biomedical Engineering at the University of Florida where he teaches statistical signal processing, machine learning and artificial neural networks (ANNs) modeling. He is the Eckis Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu . His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab has been studying signal and pattern recognition principles based on information theoretic criteria (entropy and mutual information). The relevant application domain is neurology, brain machine interfaces and computation neuroscience.

Dr. Principe is an IEEE Fellow. He was the past Chair of the Technical Committee on Neural Networks of the IEEE Signal Processing Society, Past-President of the International Neural Network Society, and Past-Editor in Chief of the IEEE Transactions on Biomedical Engineering. He received the IEEE Neural Network Pioneer Award in 2011. Dr. Principe has more than 800 publications. He directed 99 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled “Neural and Adaptive Systems” published by John Wiley and Sons and more recently co-authored several books on “Brain Machine Interface Engineering” Morgan and Claypool, “Information Theoretic Learning”, Springer, and “Kernel Adaptive Filtering”, Wiley.

#### Lectures

Lecture I – Requisites for a Cognitive Architecture

- Processing in space
- Processing in time with memory
- Top down and bottom processing
- Extraction of information from data with generative models
- Attention mechanisms and fovea vision

Lecture II – Putting it all together

- Empirical Bayes with generative models
- Clustering of time series with linear state models
- Information Theoretic Autoencoders

Lecture III – Beyond Backpropagation: Modular Learning for Deep Networks

- Reinterpretation of neural network layers
- Training each learning without backpropagation
- Examples and advantages in transfer learning

#### Lectures

#### Topics

Machine Learning for Medicine, Data Science and decisions, Artificial Intelligence#### Biography

Professor van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Turing Fellow at The Alan Turing Institute in London, where she leads the effort on data science and machine learning for personalised medicine. She is an IEEE Fellow (2009). She has received the Oon Prize on Preventative Medicine from the University of Cambridge (2018). She has also been the recipient of an NSF Career Award, 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. She holds 35 granted USA patents.

The current emphasis of her research is on machine learning with applications to medicine, finance and education. She has also worked on data science, network science, game theory, signal processing, communications, and multimedia.

http://www.vanderschaar-lab.com/NewWebsite/Publications_ML.html

**5 papers accepted at NeurIPS 2019.**

**7 papers accepted at ICLM 2020.**

#### Lectures

### Tutorial Speakers

#### Biography

Davide Bacciu is Associate Professor at the Computer Science Department, University of Pisa. The core of his research is on Machine Learning (ML) and deep learning models for structured data processing, including sequences, trees and graphs. He is the PI of an Italian National project on ML for structured data and the Coordinator of the H2020-RIA project TEACHING (2020-2022). He is an IEEE Senior Member, the founder and chair of the IEEE Task Force on learning for structured data (www.learning4graphs.org), a member of the IEEE NN Technical Committee and of the IEEE CIS Task Force on Deep Learning. He is an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems. Since 2017 he is the Secretary of the Italian Association for Artificial Intelligence (AI*IA). He coordinates the task force on Bioinformatics and Drug Repurposing of the CLAIRE-COVID-19 European initiative (covid19.claire-ai.org).

#### Lectures

#### Biography

Giuseppe Fiameni, PhD, is a Solution Architect for AI and Accelerated Computing at NVIDIA, helping researchers in optimizing deep learning workloads on High Performance Computing systems. He is the technical lead of the Italian NVIDIA Artificial Intelligence Technology Centre.

#### Lectures

The computational requirements of modern science experiments ranging from numerical simulations, deep neural networks, time series analysis to IoT used to enable a large range of applications are enormous as well as the number of technologies involved. If on one hand a single solution would not be able to address all the requirements, on the other researchers need very well integrated development environments to scale from small devices, i.e. Jetson nano cards, to large supercomputing facility for DNN training seamlessly. An appropriate use of NVIDIA solutions can significantly shorten the time required to analyse lots of data, making solving complex problems feasible. This talk will introduce you to the NVIDIA end-to-end product family moving from fully integrated AI system to embedded devices. It will also present the NVAITC (NVIDIA AI Technology Centre) initiative which is helping the scientific AI community to improve research outcomes through developing collaboration projects aiming to train students, nurture startups and spread adoption of the latest AI technology.

#### Biography

Dr Ojha is a lecturer in Computer Science at the University of Reading, UK. He worked as a postdoctoral researcher at ETH Zurich, Switzerland in a Swiss National Science Foundation project concerning machine learning and signal processing for pattern analysis of human’s perception of the urban environment. Dr Ojha worked as a Marie-Curie Fellow in a European Union-funded project on interdisciplinary research concerning computational intelligence modelling of pharmaceutical processes. He was awarded a PhD in Computer Science and Applied Mathematics by the Technical University of Ostrava, Czech Republic. His PhD work was on feature selection and function approximation using adaptive algorithms. Before this, Dr Ojha worked as a research fellow in a Govt. of India funded-project on interdisciplinary research aims at machine learning and signal processing based pattern recognition of mixed gases. Dr Ojha earned Master of Technology and Bachelor of Technology in Computer Science & Engineering. Dr Ojha is IEEE Senior Member and Member of ACM.

#### Lectures

As a part of my research with a team at the ETH Zurich, we investigated a detailed answer to the question “Do the dynamics of the city environment influence us and how”? We adopted Machine Learning and Data Mining approaches for investigating such relationship between the dynamics of the city environment and the human perception. A complex and messy dataset was analyzed and resolved for this purpose. The machine learning analysis and its results will focus on answering the following questions: (i) What are the factors that need to be accounted for from the city environment? (ii) How can we collect and process messy and complex datasets? (ii) Can we predict the citizen’s perception based on a particular environmental condition? (iv) Are we able to infer a detailed relationship between a citizen’s perception and the city environment? (v) What are the significant factors that influence a citizen’s perception? (vi) Does the citizen exhibit positive or negative emotion towards a certain environment?

#### Biography

Thomas Viehmann is a PyTorch and Machine Learning trainer and consultant. In 2018 he founded the boutique R&D consultancy MathInf based in Munich, Germany. His work spans low-level optimizations to enable efficient AI to developing cutting-edge deep-learning models for clients from startups to large multinational corporations. He is a PyTorch core developer with contributions across almost all parts of PyTorch and co-author of Deep Learning with PyTorch, to appear this summer with Manning Publications. Thomas’ education in computer science included a class in Neural Networks and Pattern Recognition at the turn of the millennium. He went on to do research in pen-and-paper Calculus of Variations and Partial Differential Equations, obtaining a Ph.D. from Bonn University.

#### Lectures

Over the past two years PyTorch has became the dominant tool for machine learning research, with many of the groundbreaking advancements appearing alongside their PyTorch implementations. Having at least a basic understanding of the library is an asset, as it allows one to easily collaborate with others, develop their own research faster or at least gain a deeper understanding of the resources published every day online.

This course will be a gentle introduction to the PyTorch library and we will go over all of its fundamental abstractions. Those include the way model code is usually structured, how can one go about computing gradients of arbitrary Python functions automatically, making effective use of accelerators such as GPUs and what are the best practices for research implementations. If time allows, we will also take a peek into some more advanced features like the just-in-time compiler.

Over the past two years PyTorch has became the dominant tool for machine learning research, with many of the groundbreaking advancements appearing alongside their PyTorch implementations. Having at least a basic understanding of the library is an asset, as it allows one to easily collaborate with others, develop their own research faster or at least gain a deeper understanding of the resources published every day online.

This course will be a gentle introduction to the PyTorch library and we will go over all of its fundamental abstractions. Those include the way model code is usually structured, how can one go about computing gradients of arbitrary Python functions automatically, making effective use of accelerators such as GPUs and what are the best practices for research implementations. If time allows, we will also take a peek into some more advanced features like the just-in-time compiler.

##### Past Lecturers

The Lecturers of the previous editions:

**Ioannis Antonoglou***, Google DeepMind, UK***Roman Belavkin***, Middlesex University London, UK***Yoshua Bengio***, Head of the Montreal Institute for Learning Algorithms (MILA) & University of Montreal, Canada***Sergiy Butenko***, Texas A&M University, USA***Giuseppe Di Fatta***, University of Reading, UK***Marco Gori***, University of Siena, Italy***Yi-Ke Guo***, Imperial College London, UK & Founding Director of Data Science Institute***Phillip Isola***, MIT, USA***Leslie Kaelbling***, MIT - Computer Science & Artificial Intelligence Lab, USA***Ilias S. Kotsireas***, Wilfrid Laurier University, Canada***Peter Norvig***, Director of Research, Google***Panos Pardalos***, University of Florida, USA***Alex 'Sandy' Pentland***, MIT & Director of MIT’s Human Dynamics Laboratory, USA***Marc'Aurelio Ranzato***, Facebook AI Research Lab, New York, USA***Dolores Romero Morales***, Copenhagen Business School, Denmark***Ruslan Salakhutdinov***, Carnegie Mellon University**, and AI Research at Apple, USA***Josh Tenenbaum***, MIT, USA***Naftali Tishby***, Hebrew University, Israel***Joaquin Vanschoren***, Eindhoven University of Technology, The Netherlands***Oriol Vinyals***, Google DeepMind, UK***Aleskerov Z. Fuad***, National Research University Higher School of Economics, Russia*