Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients2020-09-28 ${\displaystyle \cong }$ |

In this paper we introduce a provably stable architecture for Neural Ordinary Differential Equations (ODEs) which achieves non-trivial adversarial robustness under white-box adversarial attacks even when the network is trained naturally. For most existing defense methods withstanding strong white-box attacks, to improve robustness of neural networks, they need to be trained adversarially, hence have to strike a trade-off between natural accuracy and adversarial robustness. Inspired by dynamical system theory, we design a stabilized neural ODE network named SONet whose ODE blocks are skew-symmetric and proved to be input-output stable. With natural training, SONet can achieve comparable robustness with the state-of-the-art adversarial defense methods, without sacrificing natural accuracy. Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e.g., under PGD-20 ($\ell_\infty=0.031$) attack on CIFAR-10 dataset, it achieves 91.57\% and natural accuracy and 62.35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76.29\% and 45.24\%, respectively. To understand possible reasons behind this surprisingly good result, we further explore the possible mechanism underlying such an adversarial robustness. We show that the adaptive stepsize numerical ODE solver, DOPRI5, has a gradient masking effect that fails the PGD attacks which are sensitive to gradient information of training loss; on the other hand, it cannot fool the CW attack of robust gradients and the SPSA attack that is gradient-free. This provides a new explanation that the adversarial robustness of ODE-based networks mainly comes from the obfuscated gradients in numerical ODE solvers. |

Combining GANs and AutoEncoders for Efficient Anomaly Detection2020-11-16 ${\displaystyle \cong }$ |

Deep learned models are now largely adopted in different fields, and they generally provide superior performances with respect to classical signal-based approaches. Notwithstanding this, their actual reliability when working in an unprotected environment is far enough to be proven. In this work, we consider a novel deep neural network architecture, named Neural Ordinary Differential Equations (N-ODE), that is getting particular attention due to an attractive property --- a test-time tunable trade-off between accuracy and efficiency. This paper analyzes the robustness of N-ODE image classifiers when faced against a strong adversarial attack and how its effectiveness changes when varying such a tunable trade-off. We show that adversarial robustness is increased when the networks operate in different tolerance regimes during test time and training time. On this basis, we propose a novel adversarial detection strategy for N-ODE nets based on the randomization of the adaptive ODE solver tolerance. Our evaluation performed on standard image classification benchmarks shows that our detection technique provides high rejection of adversarial examples while maintaining most of the original samples under white-box attacks and zero-knowledge adversaries. |

When are Neural ODE Solutions Proper ODEs?2020-07-30 ${\displaystyle \cong }$ |

A key appeal of the recently proposed Neural Ordinary Differential Equation(ODE) framework is that it seems to provide a continuous-time extension of discrete residual neural networks. As we show herein, though, trained Neural ODE models actually depend on the specific numerical method used during training. If the trained model is supposed to be a flow generated from an ODE, it should be possible to choose another numerical solver with equal or smaller numerical error without loss of performance. We observe that if training relies on a solver with overly coarse discretization, then testing with another solver of equal or smaller numerical error results in a sharp drop in accuracy. In such cases, the combination of vector field and numerical method cannot be interpreted as a flow generated from an ODE, which arguably poses a fatal breakdown of the Neural ODE concept. We observe, however, that there exists a critical step size beyond which the training yields a valid ODE vector field. We propose a method that monitors the behavior of the ODE solver during training to adapt its step size, aiming to ensure a valid ODE without unnecessarily increasing computational cost. We verify this adaption algorithm on two common bench mark datasets as well as a synthetic dataset. Furthermore, we introduce a novel synthetic dataset in which the underlying ODE directly generates a classification task. |

Interpolation between Residual and Non-Residual Networks2020-06-26 ${\displaystyle \cong }$ |

Although ordinary differential equations (ODEs) provide insights for designing network architectures, its relationship with the non-residual convolutional neural networks (CNNs) is still unclear. In this paper, we present a novel ODE model by adding a damping term. It can be shown that the proposed model can recover both a ResNet and a CNN by adjusting an interpolation coefficient. Therefore, the damped ODE model provides a unified framework for the interpretation of residual and non-residual networks. The Lyapunov analysis reveals better stability of the proposed model, and thus yields robustness improvement of the learned networks. Experiments on a number of image classification benchmarks show that the proposed model substantially improves the accuracy of ResNet and ResNeXt over the perturbed inputs from both stochastic noise and adversarial attack methods. Moreover, the loss landscape analysis demonstrates the improved robustness of our method along the attack direction. |

Augmenting Neural Differential Equations to Model Unknown Dynamical Systems with Incomplete State Information2020-08-18 ${\displaystyle \cong }$ |

Neural Ordinary Differential Equations replace the right-hand side of a conventional ODE with a neural net, which by virtue of the universal approximation theorem, can be trained to the representation of any function. When we do not know the function itself, but have state trajectories (time evolution) of the ODE system we can still train the neural net to learn the representation of the underlying but unknown ODE. However if the state of the system is incompletely known then the right-hand side of the ODE cannot be calculated. The derivatives to propagate the system are unavailable. We show that a specially augmented Neural ODE can learn the system when given incomplete state information. As a worked example we apply neural ODEs to the Lotka-Voltera problem of 3 species, rabbits, wolves, and bears. We show that even when the data for the bear time series is removed the remaining time series of the rabbits and wolves is sufficient to learn the dynamical system despite the missing the incomplete state information. This is surprising since a conventional ODE system cannot output the correct derivatives without the full state as the input. We implement augmented neural ODEs and differential equation solvers in the julia programming language. |

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness2020-06-30 ${\displaystyle \cong }$ |

Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called \emph{Imbalanced Gradients} that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a \emph{Margin Decomposition (MD)} attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We examine 12 state-of-the-art defense models, and find that models exploiting label smoothing easily cause imbalanced gradients, and on which our MD attacks can decrease their PGD robustness (evaluated by PGD attack) by over 23%. For 6 out of the 12 defenses, our attack can reduce their PGD robustness by at least 9%. The results suggest that imbalanced gradients need to be carefully addressed for more reliable adversarial robustness. |

Stiff Neural Ordinary Differential Equations2021-03-29 ${\displaystyle \cong }$ |

Neural Ordinary Differential Equations (ODE) are a promising approach to learn dynamic models from time-series data in science and engineering applications. This work aims at learning Neural ODE for stiff systems, which are usually raised from chemical kinetic modeling in chemical and biological systems. We first show the challenges of learning neural ODE in the classical stiff ODE systems of Robertson's problem and propose techniques to mitigate the challenges associated with scale separations in stiff systems. We then present successful demonstrations in stiff systems of Robertson's problem and an air pollution problem. The demonstrations show that the usage of deep networks with rectified activations, proper scaling of the network outputs as well as loss functions, and stabilized gradient calculations are the key techniques enabling the learning of stiff neural ODE. The success of learning stiff neural ODE opens up possibilities of using neural ODEs in applications with widely varying time-scales, like chemical dynamics in energy conversion, environmental engineering, and the life sciences. |

Predicting dynamical system evolution with residual neural networks2019-10-11 ${\displaystyle \cong }$ |

Forecasting time series and time-dependent data is a common problem in many applications. One typical example is solving ordinary differential equation (ODE) systems $\dot{x}=F(x)$. Oftentimes the right hand side function $F(x)$ is not known explicitly and the ODE system is described by solution samples taken at some time points. Hence, ODE solvers cannot be used. In this paper, a data-driven approach to learning the evolution of dynamical systems is considered. We show how by training neural networks with ResNet-like architecture on the solution samples, models can be developed to predict the ODE system solution further in time. By evaluating the proposed approaches on three test ODE systems, we demonstrate that the neural network models are able to reproduce the main dynamics of the systems qualitatively well. Moreover, the predicted solution remains stable for much longer times than for other currently known models. |

Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks2019-09-25 ${\displaystyle \cong }$ |

Modern neural networks are highly non-robust against adversarial manipulation. A significant amount of work has been invested in techniques to compute lower bounds on robustness through formal guarantees and to build provably robust models. However, it is still difficult to get guarantees for larger networks or robustness against larger perturbations. Thus attack strategies are needed to provide tight upper bounds on the actual robustness. We significantly improve the randomized gradient-free attack for ReLU networks [9], in particular by scaling it up to large networks. We show that our attack achieves similar or significantly smaller robust accuracy than state-of-the-art attacks like PGD or the one of Carlini and Wagner, thus revealing an overestimation of the robustness by these state-of-the-art methods. Our attack is not based on a gradient descent scheme and in this sense gradient-free, which makes it less sensitive to the choice of hyperparameters as no careful selection of the stepsize is required. |

Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems2020-06-29 ${\displaystyle \cong }$ |

Likelihood-free (a.k.a. simulation-based) inference problems are inverse problems with expensive, or intractable, forward models. ODE inverse problems are commonly treated as likelihood-free, as their forward map has to be numerically approximated by an ODE solver. This, however, is not a fundamental constraint but just a lack of functionality in classic ODE solvers, which do not return a likelihood but a point estimate. To address this shortcoming, we employ Gaussian ODE filtering (a probabilistic numerical method for ODEs) to construct a local Gaussian approximation to the likelihood. This approximation yields tractable estimators for the gradient and Hessian of the (log-)likelihood. Insertion of these estimators into existing gradient-based optimization and sampling methods engenders new solvers for ODE inverse problems. We demonstrate that these methods outperform standard likelihood-free approaches on three benchmark-systems. |

Neural Ordinary Differential Equation based Recurrent Neural Network Model2020-05-19 ${\displaystyle \cong }$ |

Neural differential equations are a promising new member in the neural network family. They show the potential of differential equations for time series data analysis. In this paper, the strength of the ordinary differential equation (ODE) is explored with a new extension. The main goal of this work is to answer the following questions: (i)~can ODE be used to redefine the existing neural network model? (ii)~can Neural ODEs solve the irregular sampling rate challenge of existing neural network models for a continuous time series, i.e., length and dynamic nature, (iii)~how to reduce the training and evaluation time of existing Neural ODE systems? This work leverages the mathematical foundation of ODEs to redesign traditional RNNs such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). The main contribution of this paper is to illustrate the design of two new ODE-based RNN models (GRU-ODE model and LSTM-ODE) which can compute the hidden state and cell state at any point of time using an ODE solver. These models reduce the computation overhead of hidden state and cell state by a vast amount. The performance evaluation of these two new models for learning continuous time series with irregular sampling rate is then demonstrated. Experiments show that these new ODE based RNN models require less training time than Latent ODEs and conventional Neural ODEs. They can achieve higher accuracy quickly, and the design of the neural network is simpler than, previous neural ODE systems. |

Generative ODE Modeling with Known Unknowns2020-03-24 ${\displaystyle \cong }$ |

In several crucial applications, domain knowledge is encoded by a system of ordinary differential equations (ODE). A motivating example is intensive care unit patients: The dynamics of some vital physiological variables such as heart rate, blood pressure and arterial compliance can be approximately described by a known system of ODEs. Typically, some of the ODE variables are directly observed while some are unobserved, and in addition many other variables are observed but not modeled by the ODE, for example body temperature. Importantly, the unobserved ODE variables are ``known-unknowns'': We know they exist and their functional dynamics, but cannot measure them directly, nor do we know the function tying them to all observed measurements. Estimating these known-unknowns is often highly valuable to physicians. Under this scenario we wish to: (i) learn the static parameters of the ODE generating each observed time-series (ii) infer the dynamic sequence of all ODE variables including the known-unknowns, and (iii) extrapolate the future of the ODE variables and the observations of the time-series. We address this task with a variational autoencoder incorporating the known ODE function, called GOKU-net for Generative ODE modeling with Known Unknowns. We test our method on videos of pendulums with unknown length, and a model of the cardiovascular system. |

Explainable Tensorized Neural Ordinary Differential Equations forArbitrary-step Time Series Prediction2020-11-26 ${\displaystyle \cong }$ |

We propose a continuous neural network architecture, termed Explainable Tensorized Neural Ordinary Differential Equations (ETN-ODE), for multi-step time series prediction at arbitrary time points. Unlike the existing approaches, which mainly handle univariate time series for multi-step prediction or multivariate time series for single-step prediction, ETN-ODE could model multivariate time series for arbitrary-step prediction. In addition, it enjoys a tandem attention, w.r.t. temporal attention and variable attention, being able to provide explainable insights into the data. Specifically, ETN-ODE combines an explainable Tensorized Gated Recurrent Unit (Tensorized GRU or TGRU) with Ordinary Differential Equations (ODE). The derivative of the latent states is parameterized with a neural network. This continuous-time ODE network enables a multi-step prediction at arbitrary time points. We quantitatively and qualitatively demonstrate the effectiveness and the interpretability of ETN-ODE on five different multi-step prediction tasks and one arbitrary-step prediction task. Extensive experiments show that ETN-ODE can lead to accurate predictions at arbitrary time points while attaining best performance against the baseline methods in standard multi-step time series prediction. |

STEER: Simple Temporal Regularization For Neural ODEs2020-07-01 ${\displaystyle \cong }$ |

Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models. |

MRI Image Reconstruction via Learning Optimization Using Neural ODEs2020-06-30 ${\displaystyle \cong }$ |

We propose to formulate MRI image reconstruction as an optimization problem and model the optimization trajectory as a dynamic process using ordinary differential equations (ODEs). We model the dynamics in ODE with a neural network and solve the desired ODE with the off-the-shelf (fixed) solver to obtain reconstructed images. We extend this model and incorporate the knowledge of off-the-shelf ODE solvers into the network design (learned solvers). We investigate several models based on three ODE solvers and compare models with fixed solvers and learned solvers. Our models achieve better reconstruction results and are more parameter efficient than other popular methods such as UNet and cascaded CNN. We introduce a new way of tackling the MRI reconstruction problem by modeling the continuous optimization dynamics using neural ODEs. |

On-line Non-Convex Constrained Optimization2019-09-16 ${\displaystyle \cong }$ |

Time-varying non-convex continuous-valued non-linear constrained optimization is a fundamental problem. We study conditions wherein a momentum-like regularising term allow for the tracking of local optima by considering an ordinary differential equation (ODE). We then derive an efficient algorithm based on a predictor-corrector method, to track the ODE solution. |

Towards Natural Robustness Against Adversarial Examples2020-12-04 ${\displaystyle \cong }$ |

Recent studies have shown that deep neural networks are vulnerable to adversarial examples, but most of the methods proposed to defense adversarial examples cannot solve this problem fundamentally. In this paper, we theoretically prove that there is an upper bound for neural networks with identity mappings to constrain the error caused by adversarial noises. However, in actual computations, this kind of neural network no longer holds any upper bound and is therefore susceptible to adversarial examples. Following similar procedures, we explain why adversarial examples can fool other deep neural networks with skip connections. Furthermore, we demonstrate that a new family of deep neural networks called Neural ODEs (Chen et al., 2018) holds a weaker upper bound. This weaker upper bound prevents the amount of change in the result from being too large. Thus, Neural ODEs have natural robustness against adversarial examples. We evaluate the performance of Neural ODEs compared with ResNet under three white-box adversarial attacks (FGSM, PGD, DI2-FGSM) and one black-box adversarial attack (Boundary Attack). Finally, we show that the natural robustness of Neural ODEs is even better than the robustness of neural networks that are trained with adversarial training methods, such as TRADES and YOPO. |

Meta-Solver for Neural Ordinary Differential Equations2021-03-15 ${\displaystyle \cong }$ |

A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we investigate how the variability in solvers' space can improve neural ODEs performance. We consider a family of Runge-Kutta methods that are parameterized by no more than two scalar variables. Based on the solvers' properties, we propose an approach to decrease neural ODEs overfitting to the pre-defined solver, along with a criterion to evaluate such behaviour. Moreover, we show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks. Recently it was shown that neural ODEs demonstrate superiority over conventional CNNs in terms of robustness. Our work demonstrates that the model robustness can be further improved by optimizing solver choice for a given task. The source code to reproduce our experiments is available at https://github.com/juliagusak/neural-ode-metasolver. |

Smoothed Inference for Adversarially-Trained Models2020-03-16 ${\displaystyle \cong }$ |

Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded perturbations of the input. In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks. The proposed technique can be applied on top of any existing adversarial defense, but works particularly well with the randomized approaches. We examine its performance on common white-box (PGD) and black-box (transfer and NAttack) attacks on CIFAR-10 and CIFAR-100, substantially outperforming previous art for most scenarios and comparable on others. For example, we achieve 60.4% accuracy under a PGD attack on CIFAR-10 using ResNet-20, outperforming previous art by 11.7%. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided at https://github.com/yanemcovsky/SIAM |

Defending Against Physically Realizable Attacks on Image Classification2020-02-14 ${\displaystyle \cong }$ |

We study the problem of defending deep neural network approaches for image classification from physically realizable attacks. First, we demonstrate that the two most scalable and effective methods for learning robust models, adversarial training with PGD attacks and randomized smoothing, exhibit very limited effectiveness against three of the highest profile physical attacks. Next, we propose a new abstract adversarial model, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and develop two approaches for efficiently computing the resulting adversarial examples. Finally, we demonstrate that adversarial training using our new attack yields image classification models that exhibit high robustness against the physically realizable attacks we study, offering the first effective generic defense against such attacks. |