VAEs and Normalizing Flows: An Introduction to Modeling High-Dimensional Data with Deep Learning

Of great interest to research and industry is the ability to model and simulate very high-dimensional data, such as images, audio or text. We provide an introduction to a powerful and general set of techniques for high-dimensional modeling, that simultaneously allows for efficient learning, inference and synthesis. We introduce the framework of VAEs, that uses amortized variational inference to efficiently learn deep latent-variable models. We also introduce Normalizing Flows (NFs), an equally useful, and interlinking, technique. NFs allow us to improve variational inference, and can even completely remove the need for variational inference. We explain common methods such as NICE, IAF, and Glow.

Flow Contrastive Estimation: An (Un?)-Holy Trinity of Energy-Based Models, Likelihood-Based Models and GANs

Learning generative models with tractable likelihoods is fairly straightforward, as we have shown. But what about learning energy-based models, with intractable partition functions? Few methods exist that scale to high-dimensional data. To this end we introduce Flow Contrastive Estimation (FCE), a new method for estimating energy-based models. FCE was primarily conceived as a version of noise-contrastive estimation (NCE), adding an adaptive noise distribution, making it scale well to high-dimensional data. Secondarily, FCE is also a method for optimizing likelihood-based generative model w.r.t. the Jensen-Shannon divergence, as an alternative to the usual Kullback-Leibler Divergence. Lastly, we show that the FCE method is also a special case of the GAN method, where the generator is given by a flow-based model, and the discriminator is parameterized by contrasting the likelihoods of an energy-based model and the flow-based model.

Finding Deeply Hidden Truths: Breakthroughs in Nonlinear Identifiability Theory

Is principled disentanglement possible? Equivalently, do nonlinear models converge to a unique set of representations when given sufficient data and compute? We provide an introduction to this problem, and present some surprising recent theoretical results that answer this question in the affirmative. In particular, we show that the representations learned by a very broad family of neural networks are identifiable up to only a trivial transformation. The family of models for which we derive strong identifiability results includes a large fraction of models in use today, including supervised models, self-supervised models, flow-based models, VAEs and energy-based models.