Algorithmic Guarantees for Inverse Imaging with Untrained Network Priors2020-03-27 ${\displaystyle \cong }$ |

Deep neural networks as image priors have been recently introduced for problems such as denoising, super-resolution and inpainting with promising performance gains over hand-crafted image priors such as sparsity and low-rank. Unlike learned generative priors they do not require any training over large datasets. However, few theoretical guarantees exist in the scope of using untrained neural network priors for inverse imaging problems. We explore new applications and theory for untrained neural network priors. Specifically, we consider the problem of solving linear inverse problems, such as compressive sensing, as well as non-linear problems, such as compressive phase retrieval. We model images to lie in the range of an untrained deep generative network with a fixed seed. We further present a projected gradient descent scheme that can be used for both compressive sensing and phase retrieval and provide rigorous theoretical guarantees for its convergence. We also show both theoretically as well as empirically that with deep network priors, one can achieve better compression rates for the same image quality compared to hand crafted priors. |

Compressive Phase Retrieval: Optimal Sample Complexity with Deep Generative Priors2020-08-24 ${\displaystyle \cong }$ |

Advances in compressive sensing provided reconstruction algorithms of sparse signals from linear measurements with optimal sample complexity, but natural extensions of this methodology to nonlinear inverse problems have been met with potentially fundamental sample complexity bottlenecks. In particular, tractable algorithms for compressive phase retrieval with sparsity priors have not been able to achieve optimal sample complexity. This has created an open problem in compressive phase retrieval: under generic, phaseless linear measurements, are there tractable reconstruction algorithms that succeed with optimal sample complexity? Meanwhile, progress in machine learning has led to the development of new data-driven signal priors in the form of generative models, which can outperform sparsity priors with significantly fewer measurements. In this work, we resolve the open problem in compressive phase retrieval and demonstrate that generative priors can lead to a fundamental advance by permitting optimal sample complexity by a tractable algorithm in this challenging nonlinear inverse problem. We additionally provide empirics showing that exploiting generative priors in phase retrieval can significantly outperform sparsity priors. These results provide support for generative priors as a new paradigm for signal recovery in a variety of contexts, both empirically and theoretically. The strengths of this paradigm are that (1) generative priors can represent some classes of natural signals more concisely than sparsity priors, (2) generative priors allow for direct optimization over the natural signal manifold, which is intractable under sparsity priors, and (3) the resulting non-convex optimization problems with generative priors can admit benign optimization landscapes at optimal sample complexity, perhaps surprisingly, even in cases of nonlinear measurements. |

Exact priors of finite neural networks2021-04-23 ${\displaystyle \cong }$ |

Bayesian neural networks are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that finite Bayesian networks may outperform their infinite counterparts, but their non-Gaussian output priors have been characterized only though perturbative approaches. Here, we derive exact solutions for the output priors for individual input examples of a class of finite fully-connected feedforward Bayesian neural networks. For deep linear networks, the prior has a simple expression in terms of the Meijer $G$-function. The prior of a finite ReLU network is a mixture of the priors of linear networks of smaller widths, corresponding to different numbers of active units in each layer. Our results unify previous descriptions of finite network priors in terms of their tail decay and large-width behavior. |

Exact asymptotics for phase retrieval and compressed sensing with random generative priors2020-06-12 ${\displaystyle \cong }$ |

We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable. |

Reducing the Representation Error of GAN Image Priors Using the Deep Decoder2020-01-23 ${\displaystyle \cong }$ |

Generative models, such as GANs, learn an explicit low-dimensional representation of a particular class of images, and so they may be used as natural image priors for solving inverse problems such as image restoration and compressive sensing. GAN priors have demonstrated impressive performance on these tasks, but they can exhibit substantial representation error for both in-distribution and out-of-distribution images, because of the mismatch between the learned, approximate image distribution and the data generating distribution. In this paper, we demonstrate a method for reducing the representation error of GAN priors by modeling images as the linear combination of a GAN prior with a Deep Decoder. The deep decoder is an underparameterized and most importantly unlearned natural signal model similar to the Deep Image Prior. No knowledge of the specific inverse problem is needed in the training of the GAN underlying our method. For compressive sensing and image superresolution, our hybrid model exhibits consistently higher PSNRs than both the GAN priors and Deep Decoder separately, both on in-distribution and out-of-distribution images. This model provides a method for extensibly and cheaply leveraging both the benefits of learned and unlearned image recovery priors in inverse problems. |

All You Need is a Good Functional Prior for Bayesian Deep Learning2020-11-25 ${\displaystyle \cong }$ |

The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to "tune" the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility. |

Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search2018-03-13 ${\displaystyle \cong }$ |

One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors. |

DAEs for Linear Inverse Problems: Improved Recovery with Provable Guarantees2021-01-13 ${\displaystyle \cong }$ |

Generative priors have been shown to provide improved results over sparsity priors in linear inverse problems. However, current state of the art methods suffer from one or more of the following drawbacks: (a) speed of recovery is slow; (b) reconstruction quality is deficient; (c) reconstruction quality is contingent on a computationally expensive process of tuning hyperparameters. In this work, we address these issues by utilizing Denoising Auto Encoders (DAEs) as priors and a projected gradient descent algorithm for recovering the original signal. We provide rigorous theoretical guarantees for our method and experimentally demonstrate its superiority over existing state of the art methods in compressive sensing, inpainting, and super-resolution. We find that our algorithm speeds up recovery by two orders of magnitude (over 100x), improves quality of reconstruction by an order of magnitude (over 10x), and does not require tuning hyperparameters. |

Robust priors for regularized regression2020-10-06 ${\displaystyle \cong }$ |

Induction benefits from useful priors. Penalized regression approaches, like ridge regression, shrink weights toward zero but zero association is usually not a sensible prior. Inspired by simple and robust decision heuristics humans use, we constructed non-zero priors for penalized regression models that provide robust and interpretable solutions across several tasks. Our approach enables estimates from a constrained model to serve as a prior for a more general model, yielding a principled way to interpolate between models of differing complexity. We successfully applied this approach to a number of decision and classification problems, as well as analyzing simulated brain imaging data. Models with robust priors had excellent worst-case performance. Solutions followed from the form of the heuristic that was used to derive the prior. These new algorithms can serve applications in data analysis and machine learning, as well as help in understanding how people transition from novice to expert performance. |

Provably Convergent Algorithms for Solving Inverse Problems Using Generative Models2021-05-13 ${\displaystyle \cong }$ |

The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by deep generative networks). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a simple non-convex algorithmic approach that (a) theoretically enjoys linear convergence guarantees for certain linear and nonlinear inverse problems, and (b) empirically improves upon conventional techniques such as back-propagation. We support our claims with the experimental results for solving various inverse problems. We also propose an extension of our approach that can handle model mismatch (i.e., situations where the generative network prior is not exactly applicable). Together, our contributions serve as building blocks towards a principled use of generative models in inverse problems with more complete algorithmic understanding. |

Learning Scale-Free Networks by Dynamic Node-Specific Degree Prior2015-06-17 ${\displaystyle \cong }$ |

Learning the network structure underlying data is an important problem in machine learning. This paper introduces a novel prior to study the inference of scale-free networks, which are widely used to model social and biological networks. The prior not only favors a desirable global node degree distribution, but also takes into consideration the relative strength of all the possible edges adjacent to the same node and the estimated degree of each individual node. To fulfill this, ranking is incorporated into the prior, which makes the problem challenging to solve. We employ an ADMM (alternating direction method of multipliers) framework to solve the Gaussian Graphical model regularized by this prior. Our experiments on both synthetic and real data show that our prior not only yields a scale-free network, but also produces many more correctly predicted edges than the others such as the scale-free inducing prior, the hub-inducing prior and the $l_1$ norm. |

Provable Compressed Sensing with Generative Priors via Langevin Dynamics2021-02-24 ${\displaystyle \cong }$ |

Deep generative models have emerged as a powerful class of priors for signals in various inverse problems such as compressed sensing, phase retrieval and super-resolution. Here, we assume an unknown signal to lie in the range of some pre-trained generative model. A popular approach for signal recovery is via gradient descent in the low-dimensional latent space. While gradient descent has achieved good empirical performance, its theoretical behavior is not well understood. In this paper, we introduce the use of stochastic gradient Langevin dynamics (SGLD) for compressed sensing with a generative prior. Under mild assumptions on the generative model, we prove the convergence of SGLD to the true signal. We also demonstrate competitive empirical performance to standard gradient descent. |

Bayesian Neural Network Priors Revisited2021-02-12 ${\displaystyle \cong }$ |

Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, such simplistic priors are unlikely to either accurately reflect our true beliefs about the weight distributions, or to give optimal performance. We study summary statistics of neural network weights in different networks trained using SGD. We find that fully connected networks (FCNNs) display heavy-tailed weight distributions, while convolutional neural network (CNN) weights display strong spatial correlations. Building these observations into the respective priors leads to improved performance on a variety of image classification datasets. Moreover, we find that these priors also mitigate the cold posterior effect in FCNNs, while in CNNs we see strong improvements at all temperatures, and hence no reduction in the cold posterior effect. |

A Projectional Ansatz to Reconstruction2019-08-06 ${\displaystyle \cong }$ |

Recently the field of inverse problems has seen a growing usage of mathematically only partially understood learned and non-learned priors. Based on first principles, we develop a projectional approach to inverse problems that addresses the incorporation of these priors, while still guaranteeing data consistency. We implement this projectional method (PM) on the one hand via very general Plug-and-Play priors and on the other hand, via an end-to-end training approach. To this end, we introduce a novel alternating neural architecture, allowing for the incorporation of highly customized priors from data in a principled manner. We also show how the recent success of Regularization by Denoising (RED) can, at least to some extent, be explained as an approximation of the PM. Furthermore, we demonstrate how the idea can be applied to stop the degradation of Deep Image Prior (DIP) reconstructions over time. |

Large Scale Variational Bayesian Inference for Structured Scale Mixture Models2012-06-27 ${\displaystyle \cong }$ |

Natural image statistics exhibit hierarchical dependencies across multiple scales. Representing such prior knowledge in non-factorial latent tree models can boost performance of image denoising, inpainting, deconvolution or reconstruction substantially, beyond standard factorial "sparse" methodology. We derive a large scale approximate Bayesian inference algorithm for linear models with non-factorial (latent tree-structured) scale mixture priors. Experimental results on a range of denoising and inpainting problems demonstrate substantially improved performance compared to MAP estimation or to inference with factorial priors. |

Accelerating Optimization Algorithms With Dynamic Parameter Selections Using Convolutional Neural Networks For Inverse Problems In Image Processing2019-11-18 ${\displaystyle \cong }$ |

Recent advances using deep neural networks (DNNs) for solving inverse problems in image processing have significantly outperformed conventional optimization algorithm based methods. Most works train DNNs to learn 1) forward models and image priors implicitly for direct mappings from given measurements to solutions, 2) data-driven priors as proximal operators in conventional iterative algorithms, or 3) forward models, priors and/or static stepsizes in unfolded structures of optimization iterations. Here we investigate another way of utilizing convolutional neural network (CNN) for empirically accelerating conventional optimization for solving inverse problems in image processing. We propose a CNN to yield parameters in optimization algorithms that have been chosen heuristically, but have shown to be crucial for good empirical performance. Our CNN-incorporated scaled gradient projection methods, without compromising theoretical properties, significantly improve empirical convergence rate over conventional optimization based methods in large-scale inverse problems such as image inpainting, compressive image recovery with partial Fourier samples, deblurring and sparse view CT. During testing, our proposed methods dynamically select parameters every iterations to speed up convergence robustly for different degradation levels, noise, or regularization parameters as compared to direct mapping approach. |

Posterior Model Adaptation With Updated Priors2020-07-02 ${\displaystyle \cong }$ |

Classification approaches based on the direct estimation and analysis of posterior probabilities will degrade if the original class priors begin to change. We prove that a unique (up to scale) solution is possible to recover the data likelihoods for a test example from its original class posteriors and dataset priors. Given the recovered likelihoods and a set of new priors, the posteriors can be re-computed using Bayes' Rule to reflect the influence of the new priors. The method is simple to compute and allows a dynamic update of the original posteriors. |

Benefiting Deep Latent Variable Models via Learning the Prior and Removing Latent Regularization2020-07-16 ${\displaystyle \cong }$ |

There exist many forms of deep latent variable models, such as the variational autoencoder and adversarial autoencoder. Regardless of the specific class of model, there exists an implicit consensus that the latent distribution should be regularized towards the prior, even in the case where the prior distribution is learned. Upon investigating the effect of latent regularization on image generation our results indicate that in the case where a sufficiently expressive prior is learned, latent regularization is not necessary and may in fact be harmful insofar as image quality is concerned. We additionally investigate the benefit of learned priors on two common problems in computer vision: latent variable disentanglement, and diversity in image-to-image translation. |

Non-Gaussian processes and neural networks at finite widths2019-09-30 ${\displaystyle \cong }$ |

Gaussian processes are ubiquitous in nature and engineering. A case in point is a class of neural networks in the infinite-width limit, whose priors correspond to Gaussian processes. Here we perturbatively extend this correspondence to finite-width neural networks, yielding non-Gaussian processes as priors. The methodology developed herein allows us to track the flow of preactivation distributions by progressively integrating out random variables from lower to higher layers, reminiscent of renormalization-group flow. We further develop a perturbative procedure to perform Bayesian inference with weakly non-Gaussian priors. |

A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems2020-03-29 ${\displaystyle \cong }$ |

Image prior modeling is the key issue in image recovery, computational imaging, compresses sensing, and other inverse problems. Recent algorithms combining multiple effective priors such as the sparse or low-rank models, have demonstrated superior performance in various applications. However, the relationships among the popular image models are unclear, and no theory in general is available to demonstrate their connections. In this paper, we present a theoretical analysis on the image models, to bridge the gap between applications and image prior understanding, including sparsity, group-wise sparsity, joint sparsity, and low-rankness, etc. We systematically study how effective each image model is for image restoration. Furthermore, we relate the denoising performance improvement by combining multiple models, to the image model relationships. Extensive experiments are conducted to compare the denoising results which are consistent with our analysis. On top of the model-based methods, we quantitatively demonstrate the image properties that are inexplicitly exploited by deep learning method, of which can further boost the denoising performance by combining with its complementary image models. |