Integration of AI and mechanistic modeling in generative adversarial networks for stochastic inverse problems2020-09-17 ${\displaystyle \cong }$ |

The problem of finding distributions of input parameters for deterministic mechanistic models to match distributions of model outputs to stochastic observations, i.e., the "Stochastic Inverse Problem" (SIP), encompasses a range of common tasks across a variety of scientific disciplines. Here, we demonstrate that SIP could be reformulated as a constrained optimization problem and adapted for applications in intervention studies to simultaneously infer model input parameters for two sets of observations, under control conditions and under an intervention. In the constrained optimization problem, the solution of SIP is enforced to accommodate the prior knowledge on the model input parameters and to produce outputs consistent with given observations by minimizing the divergence between the inferred distribution of input parameters and the prior. Unlike in standard SIP, the prior incorporates not only knowledge about model input parameters for objects in each set, but also information on the joint distribution or the deterministic map between the model input parameters in two sets of observations. To solve standard and intervention SIP, we employed conditional generative adversarial networks (GANs) and designed novel GANs that incorporate multiple generators and discriminators and have structures that reflect the underlying constrained optimization problems. This reformulation allows us to build computationally scalable solutions to tackle complex model input parameter inference scenarios, which appear routinely in physics, biophysics, economics and other areas, and which currently could not be handled with existing methods. |

Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks2020-02-27 ${\displaystyle \cong }$ |

While much attention has been given to the problem of estimating the effect of discrete interventions from observational data, relatively little work has been done in the setting of continuous-valued interventions, such as treatments associated with a dosage parameter. In this paper, we tackle this problem by building on a modification of the generative adversarial networks (GANs) framework. Our model, SCIGAN, is flexible and capable of simultaneously estimating counterfactual outcomes for several different continuous interventions. The key idea is to use a significantly modified GAN model to learn to generate counterfactual outcomes, which can then be used to learn an inference model, using standard supervised methods, capable of estimating these counterfactuals for a new sample. To address the challenges presented by shifting to continuous interventions, we propose a novel architecture for our discriminator - we build a hierarchical discriminator that leverages the structure of the continuous intervention setting. Moreover, we provide theoretical results to support our use of the GAN framework and of the hierarchical discriminator. In the experiments section, we introduce a new semi-synthetic data simulation for use in the continuous intervention setting and demonstrate improvements over the existing benchmark models. |

Computer Model Calibration with Time Series Data using Deep Learning and Quantile Regression2020-09-08 ${\displaystyle \cong }$ |

Computer models play a key role in many scientific and engineering problems. One major source of uncertainty in computer model experiment is input parameter uncertainty. Computer model calibration is a formal statistical procedure to infer input parameters by combining information from model runs and observational data. The existing standard calibration framework suffers from inferential issues when the model output and observational data are high-dimensional dependent data such as large time series due to the difficulty in building an emulator and the non-identifiability between effects from input parameters and data-model discrepancy. To overcome these challenges we propose a new calibration framework based on a deep neural network (DNN) with long-short term memory layers that directly emulates the inverse relationship between the model output and input parameters. Adopting the 'learning with noise' idea we train our DNN model to filter out the effects from data model discrepancy on input parameter inference. We also formulate a new way to construct interval predictions for DNN using quantile regression to quantify the uncertainty in input parameter estimates. Through a simulation study and real data application with WRF-hydro model we show that our approach can yield accurate point estimates and well calibrated interval estimates for input parameters. |

Learning Convex Optimization Models2020-06-18 ${\displaystyle \cong }$ |

A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each. |

Real-time parameter inference in reduced-order flame models with heteroscedastic Bayesian neural network ensembles2020-10-11 ${\displaystyle \cong }$ |

The estimation of model parameters with uncertainties from observed data is a ubiquitous inverse problem in science and engineering. In this paper, we suggest an inexpensive and easy to implement parameter estimation technique that uses a heteroscedastic Bayesian Neural Network trained using anchored ensembling. The heteroscedastic aleatoric error of the network models the irreducible uncertainty due to parameter degeneracies in our inverse problem, while the epistemic uncertainty of the Bayesian model captures uncertainties which may arise from an input observation's out-of-distribution nature. We use this tool to perform real-time parameter inference in a 6 parameter G-equation model of a ducted, premixed flame from observations of acoustically excited flames. We train our networks on a library of 2.1 million simulated flame videos. Results on the test dataset of simulated flames show that the network recovers flame model parameters, with the correlation coefficient between predicted and true parameters ranging from 0.97 to 0.99, and well-calibrated uncertainty estimates. The trained neural networks are then used to infer model parameters from real videos of a premixed Bunsen flame captured using a high-speed camera in our lab. Re-simulation using inferred parameters shows excellent agreement between the real and simulated flames. Compared to Ensemble Kalman Filter-based tools that have been proposed for this problem in the combustion literature, our neural network ensemble achieves better data-efficiency and our sub-millisecond inference times represent a savings on computational costs by several orders of magnitude. This allows us to calibrate our reduced-order flame model in real-time and predict the thermoacoustic instability behaviour of the flame more accurately. |

Adversarial Likelihood-Free Inference on Black-Box Generator2020-06-11 ${\displaystyle \cong }$ |

Generative Adversarial Network (GAN) can be viewed as an implicit estimator of a data distribution, and this perspective motivates using the adversarial concept in the true input parameter estimation of black-box generators. While previous works on likelihood-free inference introduces an implicit proposal distribution on the generator input, this paper analyzes theoretic limitations of the proposal distribution approach. On top of that, we introduce a new algorithm, Adversarial Likelihood-Free Inference (ALFI), to mitigate the analyzed limitations, so ALFI is able to find the posterior distribution on the input parameter for black-box generative models. We experimented ALFI with diverse simulation models as well as pre-trained statistical models, and we identified that ALFI achieves the best parameter estimation accuracy with a limited simulation budget. |

A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution2020-08-01 ${\displaystyle \cong }$ |

With the increasing adoption of predictive models trained using machine learning across a wide range of high-stakes applications, e.g., health care, security, criminal justice, finance, and education, there is a growing need for effective techniques for explaining such models and their predictions. We aim to address this problem in settings where the predictive model is a black box; That is, we can only observe the response of the model to various inputs, but have no knowledge about the internal structure of the predictive model, its parameters, the objective function, and the algorithm used to optimize the model. We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output, from observations of the model inputs and the corresponding outputs. We estimate the causal effects of model inputs on model output using variants of the Rubin Neyman potential outcomes framework for estimating causal effects from observational data. We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions. We present results of experiments that demonstrate the effectiveness of our approach to the interpretation of black box predictive models via causal attribution in the case of deep neural network models trained on one synthetic data set (where the input variables that impact the output variable are known by design) and two real-world data sets: Handwritten digit classification, and Parkinson's disease severity prediction. Because our approach does not require knowledge about the predictive model algorithm and is free of assumptions regarding the black box predictive model except that its input-output responses be observable, it can be applied, in principle, to any black box predictive model. |

Learning to Optimize Computational Resources: Frugal Training with Generalization Guarantees2019-09-09 ${\displaystyle \cong }$ |

Algorithms typically come with tunable parameters that have a considerable impact on the computational resources they consume. Too often, practitioners must hand-tune the parameters, a tedious and error-prone task. A recent line of research provides algorithms that return nearly-optimal parameters from within a finite set. These algorithms can be used when the parameter space is infinite by providing as input a random sample of parameters. This data-independent discretization, however, might miss pockets of nearly-optimal parameters: prior research has presented scenarios where the only viable parameters lie within an arbitrarily small region. We provide an algorithm that learns a finite set of promising parameters from within an infinite set. Our algorithm can help compile a configuration portfolio, or it can be used to select the input to a configuration algorithm for finite parameter spaces. Our approach applies to any configuration problem that satisfies a simple yet ubiquitous structure: the algorithm's performance is a piecewise constant function of its parameters. Prior research has exhibited this structure in domains from integer programming to clustering. |

Model Inversion Networks for Model-Based Optimization2019-12-31 ${\displaystyle \cong }$ |

In this work, we aim to solve data-driven optimization problems, where the goal is to find an input that maximizes an unknown score function given access to a dataset of inputs with corresponding scores. When the inputs are high-dimensional and valid inputs constitute a small subset of this space (e.g., valid protein sequences or valid natural images), such model-based optimization problems become exceptionally difficult, since the optimizer must avoid out-of-distribution and invalid inputs. We propose to address such problem with model inversion networks (MINs), which learn an inverse mapping from scores to inputs. MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems. MINs can also handle both purely offline data sources and active data collection. We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data. |

Image-based model parameter optimization using Model-Assisted Generative Adversarial Networks2020-03-12 ${\displaystyle \cong }$ |

We propose and demonstrate the use of a model-assisted generative adversarial network (GAN) to produce fake images that accurately match true images through the variation of the parameters of the model that describes the features of the images. The generator learns the model parameter values that produce fake images that best match the true images. Two case studies show excellent agreement between the generated best match parameters and the true parameters. The best match model parameter values can be used to retune the default simulation to minimize any bias when applying image recognition techniques to fake and true images. In the case of a real-world experiment, the true images are experimental data with unknown true model parameter values, and the fake images are produced by a simulation that takes the model parameters as input. The model-assisted GAN uses a convolutional neural network to emulate the simulation for all parameter values that, when trained, can be used as a conditional generator for fast fake-image production. |

A Hybrid Objective Function for Robustness of Artificial Neural Networks -- Estimation of Parameters in a Mechanical System2020-04-16 ${\displaystyle \cong }$ |

In several studies, hybrid neural networks have proven to be more robust against noisy input data compared to plain data driven neural networks. We consider the task of estimating parameters of a mechanical vehicle model based on acceleration profiles. We introduce a convolutional neural network architecture that is capable to predict the parameters for a family of vehicle models that differ in the unknown parameters. We introduce a convolutional neural network architecture that given sequential data predicts the parameters of the underlying data's dynamics. This network is trained with two objective functions. The first one constitutes a more naive approach that assumes that the true parameters are known. The second objective incorporates the knowledge of the underlying dynamics and is therefore considered as hybrid approach. We show that in terms of robustness, the latter outperforms the first objective on noisy input data. |

Neural Process for Black-Box Model Optimization Under Bayesian Framework2021-04-03 ${\displaystyle \cong }$ |

There are a large number of optimization problems in physical models where the relationships between model parameters and outputs are unknown or hard to track. These models are named as black-box models in general because they can only be viewed in terms of inputs and outputs, without knowledge of the internal workings. Optimizing the black-box model parameters has become increasingly expensive and time consuming as they have become more complex. Hence, developing effective and efficient black-box model optimization algorithms has become an important task. One powerful algorithm to solve such problem is Bayesian optimization, which can effectively estimates the model parameters that lead to the best performance, and Gaussian Process (GP) has been one of the most widely used surrogate model in Bayesian optimization. However, the time complexity of GP scales cubically with respect to the number of observed model outputs, and GP does not scale well with large parameter dimension either. Consequently, it has been challenging for GP to optimize black-box models that need to query many observations and/or have many parameters. To overcome the drawbacks of GP, in this study, we propose a general Bayesian optimization algorithm that employs a Neural Process (NP) as the surrogate model to perform black-box model optimization, namely, Neural Process for Bayesian Optimization (NPBO). In order to validate the benefits of NPBO, we compare NPBO with four benchmark approaches on a power system parameter optimization problem and a series of seven benchmark Bayesian optimization problems. The results show that the proposed NPBO performs better than the other four benchmark approaches on the power system parameter optimization problem and competitively on the seven benchmark problems. |

Surprisal-Triggered Conditional Computation with Neural Networks2020-06-02 ${\displaystyle \cong }$ |

Autoregressive neural network models have been used successfully for sequence generation, feature extraction, and hypothesis scoring. This paper presents yet another use for these models: allocating more computation to more difficult inputs. In our model, an autoregressive model is used both to extract features and to predict observations in a stream of input observations. The surprisal of the input, measured as the negative log-likelihood of the current observation according to the autoregressive model, is used as a measure of input difficulty. This in turn determines whether a small, fast network, or a big, slow network, is used. Experiments on two speech recognition tasks show that our model can match the performance of a baseline in which the big network is always used with 15% fewer FLOPs. |

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks2018-11-07 ${\displaystyle \cong }$ |

In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients. |

Deep Inverse Optimization2018-12-03 ${\displaystyle \cong }$ |

Given a set of observations generated by an optimization process, the goal of inverse optimization is to determine likely parameters of that process. We cast inverse optimization as a form of deep learning. Our method, called deep inverse optimization, is to unroll an iterative optimization process and then use backpropagation to learn parameters that generate the observations. We demonstrate that by backpropagating through the interior point algorithm we can learn the coefficients determining the cost vector and the constraints, independently or jointly, for both non-parametric and parametric linear programs, starting from one or multiple observations. With this approach, inverse optimization can leverage concepts and algorithms from deep learning. |

Optimization in Machine Learning: A Distribution Space Approach2020-04-18 ${\displaystyle \cong }$ |

We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization. This observation allows us to repose such problems via a suitable relaxation as convex optimization problems in the space of distributions over the training parameters. We derive some simple relationships between the distribution-space problem and the original problem, e.g. a distribution-space solution is at least as good as a solution in the original space. Moreover, we develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in distribution space. Consistency of this approximation is established and the numerical efficacy of the proposed algorithm is illustrated on simple examples. In both theory and practice, this formulation provides an alternative approach to large-scale optimization in machine learning. |

Budgeted and Non-budgeted Causal Bandits2020-12-13 ${\displaystyle \cong }$ |

Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified. If there are no backdoor paths from an intervenable node to the reward node then we propose an algorithm to minimize simple regret that optimally trades-off observations and interventions based on the cost of intervention. We also propose an algorithm that accounts for the cost of interventions, utilizes causal side-information, and minimizes the expected cumulative regret without exceeding the budget. Our cumulative-regret minimization algorithm performs better than standard algorithms that do not take side-information into account. Finally, we study the problem of learning best interventions without budget constraint in general graphs and give an algorithm that achieves constant expected cumulative regret in terms of the instance parameters when the parent distribution of the reward variable for each intervention is known. Our results are experimentally validated and compared to the best-known bounds in the current literature. |

Identification Methods With Arbitrary Interventional Distributions as Inputs2020-04-15 ${\displaystyle \cong }$ |

Causal inference quantifies cause-effect relationships by estimating counterfactual parameters from data. This entails using \emph{identification theory} to establish a link between counterfactual parameters of interest and distributions from which data is available. A line of work characterized non-parametric identification for a wide variety of causal parameters in terms of the \emph{observed data distribution}. More recently, identification results have been extended to settings where experimental data from interventional distributions is also available. In this paper, we use Single World Intervention Graphs and a nested factorization of models associated with mixed graphs to give a very simple view of existing identification theory for experimental data. We use this view to yield general identification algorithms for settings where the input distributions consist of an arbitrary set of observational and experimental distributions, including marginal and conditional distributions. We show that for problems where inputs are interventional marginal distributions of a certain type (ancestral marginals), our algorithm is complete. |

Deep Energy-Based NARX Models2020-12-07 ${\displaystyle \cong }$ |

This paper is directed towards the problem of learning nonlinear ARX models based on system input--output data. In particular, our interest is in learning a conditional distribution of the current output based on a finite window of past inputs and outputs. To achieve this, we consider the use of so-called energy-based models, which have been developed in allied fields for learning unknown distributions based on data. This energy-based model relies on a general function to describe the distribution, and here we consider a deep neural network for this purpose. The primary benefit of this approach is that it is capable of learning both simple and highly complex noise models, which we demonstrate on simulated and experimental data. |

Regularized Nonlinear Regression for Simultaneously Selecting and Estimating Key Model Parameters2021-04-23 ${\displaystyle \cong }$ |

In system identification, estimating parameters of a model using limited observations results in poor identifiability. To cope with this issue, we propose a new method to simultaneously select and estimate sensitive parameters as key model parameters and fix the remaining parameters to a set of typical values. Our method is formulated as a nonlinear least squares estimator with L1-regularization on the deviation of parameters from a set of typical values. First, we provide consistency and oracle properties of the proposed estimator as a theoretical foundation. Second, we provide a novel approach based on Levenberg-Marquardt optimization to numerically find the solution to the formulated problem. Third, to show the effectiveness, we present an application identifying a biomechanical parametric model of a head position tracking task for 10 human subjects from limited data. In a simulation study, the variances of estimated parameters are decreased by 96.1% as compared to that of the estimated parameters without L1-regularization. In an experimental study, our method improves the model interpretation by reducing the number of parameters to be estimated while maintaining variance accounted for (VAF) at above 82.5%. Moreover, the variances of estimated parameters are reduced by 71.1% as compared to that of the estimated parameters without L1-regularization. Our method is 54 times faster than the standard simplex-based optimization to solve the regularized nonlinear regression. |