Stiff Neural Ordinary Differential Equations2021-03-29 ${\displaystyle \cong }$ |

Neural Ordinary Differential Equations (ODE) are a promising approach to learn dynamic models from time-series data in science and engineering applications. This work aims at learning Neural ODE for stiff systems, which are usually raised from chemical kinetic modeling in chemical and biological systems. We first show the challenges of learning neural ODE in the classical stiff ODE systems of Robertson's problem and propose techniques to mitigate the challenges associated with scale separations in stiff systems. We then present successful demonstrations in stiff systems of Robertson's problem and an air pollution problem. The demonstrations show that the usage of deep networks with rectified activations, proper scaling of the network outputs as well as loss functions, and stabilized gradient calculations are the key techniques enabling the learning of stiff neural ODE. The success of learning stiff neural ODE opens up possibilities of using neural ODEs in applications with widely varying time-scales, like chemical dynamics in energy conversion, environmental engineering, and the life sciences. |

Predicting dynamical system evolution with residual neural networks2019-10-11 ${\displaystyle \cong }$ |

Forecasting time series and time-dependent data is a common problem in many applications. One typical example is solving ordinary differential equation (ODE) systems $\dot{x}=F(x)$. Oftentimes the right hand side function $F(x)$ is not known explicitly and the ODE system is described by solution samples taken at some time points. Hence, ODE solvers cannot be used. In this paper, a data-driven approach to learning the evolution of dynamical systems is considered. We show how by training neural networks with ResNet-like architecture on the solution samples, models can be developed to predict the ODE system solution further in time. By evaluating the proposed approaches on three test ODE systems, we demonstrate that the neural network models are able to reproduce the main dynamics of the systems qualitatively well. Moreover, the predicted solution remains stable for much longer times than for other currently known models. |

Augmenting Neural Differential Equations to Model Unknown Dynamical Systems with Incomplete State Information2020-08-18 ${\displaystyle \cong }$ |

Neural Ordinary Differential Equations replace the right-hand side of a conventional ODE with a neural net, which by virtue of the universal approximation theorem, can be trained to the representation of any function. When we do not know the function itself, but have state trajectories (time evolution) of the ODE system we can still train the neural net to learn the representation of the underlying but unknown ODE. However if the state of the system is incompletely known then the right-hand side of the ODE cannot be calculated. The derivatives to propagate the system are unavailable. We show that a specially augmented Neural ODE can learn the system when given incomplete state information. As a worked example we apply neural ODEs to the Lotka-Voltera problem of 3 species, rabbits, wolves, and bears. We show that even when the data for the bear time series is removed the remaining time series of the rabbits and wolves is sufficient to learn the dynamical system despite the missing the incomplete state information. This is surprising since a conventional ODE system cannot output the correct derivatives without the full state as the input. We implement augmented neural ODEs and differential equation solvers in the julia programming language. |

Generative ODE Modeling with Known Unknowns2020-03-24 ${\displaystyle \cong }$ |

In several crucial applications, domain knowledge is encoded by a system of ordinary differential equations (ODE). A motivating example is intensive care unit patients: The dynamics of some vital physiological variables such as heart rate, blood pressure and arterial compliance can be approximately described by a known system of ODEs. Typically, some of the ODE variables are directly observed while some are unobserved, and in addition many other variables are observed but not modeled by the ODE, for example body temperature. Importantly, the unobserved ODE variables are ``known-unknowns'': We know they exist and their functional dynamics, but cannot measure them directly, nor do we know the function tying them to all observed measurements. Estimating these known-unknowns is often highly valuable to physicians. Under this scenario we wish to: (i) learn the static parameters of the ODE generating each observed time-series (ii) infer the dynamic sequence of all ODE variables including the known-unknowns, and (iii) extrapolate the future of the ODE variables and the observations of the time-series. We address this task with a variational autoencoder incorporating the known ODE function, called GOKU-net for Generative ODE modeling with Known Unknowns. We test our method on videos of pendulums with unknown length, and a model of the cardiovascular system. |

When are Neural ODE Solutions Proper ODEs?2020-07-30 ${\displaystyle \cong }$ |

A key appeal of the recently proposed Neural Ordinary Differential Equation(ODE) framework is that it seems to provide a continuous-time extension of discrete residual neural networks. As we show herein, though, trained Neural ODE models actually depend on the specific numerical method used during training. If the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance. We observe that if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. In such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the Neural ODE concept. We observe, however, that there exists a critical step size beyond which the training yields a valid ODE vector field. We propose a method that monitors the behavior of the ODE solver during training to adapt its step size, aiming to ensure a valid ODE without unnecessarily increasing computational cost. We verify this adaption algorithm on two common bench mark datasets as well as a synthetic dataset. Furthermore, we introduce a novel synthetic dataset in which the underlying ODE directly generates a classification task. |

Explainable Tensorized Neural Ordinary Differential Equations forArbitrary-step Time Series Prediction2020-11-26 ${\displaystyle \cong }$ |

We propose a continuous neural network architecture, termed Explainable Tensorized Neural Ordinary Differential Equations (ETN-ODE), for multi-step time series prediction at arbitrary time points. Unlike the existing approaches, which mainly handle univariate time series for multi-step prediction or multivariate time series for single-step prediction, ETN-ODE could model multivariate time series for arbitrary-step prediction. In addition, it enjoys a tandem attention, w.r.t. temporal attention and variable attention, being able to provide explainable insights into the data. Specifically, ETN-ODE combines an explainable Tensorized Gated Recurrent Unit (Tensorized GRU or TGRU) with Ordinary Differential Equations (ODE). The derivative of the latent states is parameterized with a neural network. This continuous-time ODE network enables a multi-step prediction at arbitrary time points. We quantitatively and qualitatively demonstrate the effectiveness and the interpretability of ETN-ODE on five different multi-step prediction tasks and one arbitrary-step prediction task. Extensive experiments show that ETN-ODE can lead to accurate predictions at arbitrary time points while attaining best performance against the baseline methods in standard multi-step time series prediction. |

Neural Ordinary Differential Equation based Recurrent Neural Network Model2020-05-19 ${\displaystyle \cong }$ |

Neural differential equations are a promising new member in the neural network family. They show the potential of differential equations for time series data analysis. In this paper, the strength of the ordinary differential equation (ODE) is explored with a new extension. The main goal of this work is to answer the following questions: (i)~can ODE be used to redefine the existing neural network model? (ii)~can Neural ODEs solve the irregular sampling rate challenge of existing neural network models for a continuous time series, i.e., length and dynamic nature, (iii)~how to reduce the training and evaluation time of existing Neural ODE systems? This work leverages the mathematical foundation of ODEs to redesign traditional RNNs such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). The main contribution of this paper is to illustrate the design of two new ODE-based RNN models (GRU-ODE model and LSTM-ODE) which can compute the hidden state and cell state at any point of time using an ODE solver. These models reduce the computation overhead of hidden state and cell state by a vast amount. The performance evaluation of these two new models for learning continuous time series with irregular sampling rate is then demonstrated. Experiments show that these new ODE based RNN models require less training time than Latent ODEs and conventional Neural ODEs. They can achieve higher accuracy quickly, and the design of the neural network is simpler than, previous neural ODE systems. |

STEER: Simple Temporal Regularization For Neural ODEs2020-07-01 ${\displaystyle \cong }$ |

Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models. |

Combining GANs and AutoEncoders for Efficient Anomaly Detection2020-11-16 ${\displaystyle \cong }$ |

Deep learned models are now largely adopted in different fields, and they generally provide superior performances with respect to classical signal-based approaches. Notwithstanding this, their actual reliability when working in an unprotected environment is far enough to be proven. In this work, we consider a novel deep neural network architecture, named Neural Ordinary Differential Equations (N-ODE), that is getting particular attention due to an attractive property --- a test-time tunable trade-off between accuracy and efficiency. This paper analyzes the robustness of N-ODE image classifiers when faced against a strong adversarial attack and how its effectiveness changes when varying such a tunable trade-off. We show that adversarial robustness is increased when the networks operate in different tolerance regimes during test time and training time. On this basis, we propose a novel adversarial detection strategy for N-ODE nets based on the randomization of the adaptive ODE solver tolerance. Our evaluation performed on standard image classification benchmarks shows that our detection technique provides high rejection of adversarial examples while maintaining most of the original samples under white-box attacks and zero-knowledge adversaries. |

Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems2020-06-29 ${\displaystyle \cong }$ |

Likelihood-free (a.k.a. simulation-based) inference problems are inverse problems with expensive, or intractable, forward models. ODE inverse problems are commonly treated as likelihood-free, as their forward map has to be numerically approximated by an ODE solver. This, however, is not a fundamental constraint but just a lack of functionality in classic ODE solvers, which do not return a likelihood but a point estimate. To address this shortcoming, we employ Gaussian ODE filtering (a probabilistic numerical method for ODEs) to construct a local Gaussian approximation to the likelihood. This approximation yields tractable estimators for the gradient and Hessian of the (log-)likelihood. Insertion of these estimators into existing gradient-based optimization and sampling methods engenders new solvers for ODE inverse problems. We demonstrate that these methods outperform standard likelihood-free approaches on three benchmark-systems. |

On-line Non-Convex Constrained Optimization2019-09-16 ${\displaystyle \cong }$ |

Time-varying non-convex continuous-valued non-linear constrained optimization is a fundamental problem. We study conditions wherein a momentum-like regularising term allow for the tracking of local optima by considering an ordinary differential equation (ODE). We then derive an efficient algorithm based on a predictor-corrector method, to track the ODE solution. |

Shadowing Properties of Optimization Algorithms2019-11-12 ${\displaystyle \cong }$ |

Ordinary differential equation (ODE) models of gradient-based optimization methods can provide insights into the dynamics of learning and inspire the design of new algorithms. Unfortunately, this thought-provoking perspective is weakened by the fact that, in the worst case, the error between the algorithm steps and its ODE approximation grows exponentially with the number of iterations. In an attempt to encourage the use of continuous-time methods in optimization, we show that, if some additional regularity on the objective is assumed, the ODE representations of Gradient Descent and Heavy-ball do not suffer from the aforementioned problem, once we allow for a small perturbation on the algorithm initial condition. In the dynamical systems literature, this phenomenon is called shadowing. Our analysis relies on the concept of hyperbolicity, as well as on tools from numerical analysis. |

Theoretical Guarantees for Learning Conditional Expectation using Controlled ODE-RNN2020-06-08 ${\displaystyle \cong }$ |

Continuous stochastic processes are widely used to model time series that exhibit a random behaviour. Predictions of the stochastic process can be computed by the conditional expectation given the current information. To this end, we introduce the controlled ODE-RNN that provides a data-driven approach to learn the conditional expectation of a stochastic process. Our approach extends the ODE-RNN framework which models the latent state of a recurrent neural network (RNN) between two observations with a neural ordinary differential equation (neural ODE). We show that controlled ODEs provide a general framework which can in particular describe the ODE-RNN, combining in a single equation the continuous neural ODE part with the jumps introduced by RNN. We demonstrate the predictive capabilities of this model by proving that, under some regularities assumptions, the output process converges to the conditional expectation process. |

Cubic Spline Smoothing Compensation for Irregularly Sampled Sequences2020-10-03 ${\displaystyle \cong }$ |

The marriage of recurrent neural networks and neural ordinary differential networks (ODE-RNN) is effective in modeling irregularly-observed sequences. While ODE produces the smooth hidden states between observation intervals, the RNN will trigger a hidden state jump when a new observation arrives, thus cause the interpolation discontinuity problem. To address this issue, we propose the cubic spline smoothing compensation, which is a stand-alone module upon either the output or the hidden state of ODE-RNN and can be trained end-to-end. We derive its analytical solution and provide its theoretical interpolation error bound. Extensive experiments indicate its merits over both ODE-RNN and cubic spline interpolation. |

Learning Long-Term Dependencies in Irregularly-Sampled Time Series2020-06-08 ${\displaystyle \cong }$ |

Recurrent neural networks (RNNs) with continuous-time hidden states are a natural fit for modeling irregularly-sampled time series. These models, however, face difficulties when the input data possess long-term dependencies. We prove that similar to standard RNNs, the underlying reason for this issue is the vanishing or exploding of the gradient during training. This phenomenon is expressed by the ordinary differential equation (ODE) representation of the hidden state, regardless of the ODE solver's choice. We provide a solution by designing a new algorithm based on the long short-term memory (LSTM) that separates its memory from its time-continuous state. This way, we encode a continuous-time dynamical flow within the RNN, allowing it to respond to inputs arriving at arbitrary time-lags while ensuring a constant error propagation through the memory path. We call these RNN models ODE-LSTMs. We experimentally show that ODE-LSTMs outperform advanced RNN-based counterparts on non-uniformly sampled data with long-term dependencies. |

Dealing with Stochasticity in Biological ODE Models2019-10-14 ${\displaystyle \cong }$ |

Mathematical modeling with Ordinary Differential Equations (ODEs) has proven to be extremely successful in a variety of fields, including biology. However, these models are completely deterministic given a certain set of initial conditions. We convert mathematical ODE models of three benchmark biological systems to Dynamic Bayesian Networks (DBNs). The DBN model can handle model uncertainty and data uncertainty in a principled manner. They can be used for temporal data mining for noisy and missing variables. We apply Particle Filtering algorithm to infer the model variables by re-estimating the models parameters of various biological ODE models. The model parameters are automatically re-estimated using temporal evidence in the form of data streams. The results show that DBNs are capable of inferring the model variables of the ODE model with high accuracy in situations where data is missing, incomplete, sparse and irregular and true values of model parameters are not known. |

Time-Reversal Symmetric ODE Network2020-07-22 ${\displaystyle \cong }$ |

Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discrepancy in the time evolution of ODE networks between forward and backward dynamics. Then, we design a new framework, which we name as Time-Reversal Symmetric ODE Networks (TRS-ODENs), that can learn the dynamics of physical systems more sample-efficiently by learning with the proposed loss function. We evaluate TRS-ODENs on several classical dynamics, and find they can learn the desired time evolution from observed noisy and complex trajectories. We also show that, even for systems that do not possess the full time-reversal symmetry, TRS-ODENs can achieve better predictive errors over baselines. |

Rodent: Relevance determination in differential equations2020-03-12 ${\displaystyle \cong }$ |

We aim to identify the generating, ordinary differential equation (ODE) from a set of trajectories of a partially observed system. Our approach does not need prescribed basis functions to learn the ODE model, but only a rich set of Neural Arithmetic Units. For maximal explainability of the learnt model, we minimise the state size of the ODE as well as the number of non-zero parameters that are needed to solve the problem. This sparsification is realized through a combination of the Variational Auto-Encoder (VAE) and Automatic Relevance Determination (ARD). We show that it is possible to learn not only one specific model for a single process, but a manifold of models representing harmonic signals as well as a manifold of Lotka-Volterra systems. |

Time Dependence in Non-Autonomous Neural ODEs2020-05-06 ${\displaystyle \cong }$ |

Neural Ordinary Differential Equations (ODEs) are elegant reinterpretations of deep networks where continuous time can replace the discrete notion of depth, ODE solvers perform forward propagation, and the adjoint method enables efficient, constant memory backpropagation. Neural ODEs are universal approximators only when they are non-autonomous, that is, the dynamics depends explicitly on time. We propose a novel family of Neural ODEs with time-varying weights, where time-dependence is non-parametric, and the smoothness of weight trajectories can be explicitly controlled to allow a tradeoff between expressiveness and efficiency. Using this enhanced expressiveness, we outperform previous Neural ODE variants in both speed and representational capacity, ultimately outperforming standard ResNet and CNN models on select image classification and video prediction tasks. |

Neural Ordinary Differential Equations for Intervention Modeling2020-10-16 ${\displaystyle \cong }$ |

By interpreting the forward dynamics of the latent representation of neural networks as an ordinary differential equation, Neural Ordinary Differential Equation (Neural ODE) emerged as an effective framework for modeling a system dynamics in the continuous time domain. However, real-world systems often involves external interventions that cause changes in the system dynamics such as a moving ball coming in contact with another ball, or such as a patient being administered with particular drug. Neural ODE and a number of its recent variants, however, are not suitable for modeling such interventions as they do not properly model the observations and the interventions separately. In this paper, we propose a novel neural ODE-based approach (IMODE) that properly model the effect of external interventions by employing two ODE functions to separately handle the observations and the interventions. Using both synthetic and real-world time-series datasets involving interventions, our experimental results consistently demonstrate the superiority of IMODE compared to existing approaches. |