Models, code, and papers for "Na Zou":

PyODDS is an end-to end Python system for outlier detection with database support. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access to a wide range of outlier detection algorithms, including statistical analysis and more recent deep learning based approaches. PyODDS is released under the MIT open-source license, and currently available at (https://github.com/datamllab/pyodds) with official documentations at (https://pyodds.github.io/).

Deep learning is increasingly being used in high-stake decision making applications that affect individual lives. However, deep learning models might exhibit algorithmic discrimination behaviors with respect to protected groups, potentially posing negative impacts on individuals and society. Therefore, fairness in deep learning has attracted tremendous attention recently. We provide a comprehensive review covering existing techniques to tackle algorithmic fairness problems from the computational perspective. Specifically, we show that interpretability can serve as a useful ingredient, which could be augmented into the biases detection and mitigation pipelines. We also discuss open research problems and future research directions, aiming to push forward the area of fairness in deep learning and build genuinely fair, accountable, and transparent deep learning systems.

An important step in early brain development study is to perform automatic segmentation of infant brain magnetic resonance (MR) images into cerebrospinal fluid (CSF), gray matter (GM) and white matter (WM) regions. This task is especially challenging in the isointense stage (approximately 6-8 months of age) when GM and WM exhibit similar levels of intensities in MR images. Deep learning has shown its great promise in various image segmentation tasks. However, existing models do not have an efficient and effective way to aggregate global information. They also suffer from information loss during up-sampling operations. In this work, we address these problems by proposing a global aggregation block, which can be flexibly used for global information fusion. We build a novel model based on 3D U-Net to make fast and accurate voxel-wise dense prediction. We perform thorough experiments, and results indicate that our model outperforms previous best models significantly on 3D multimodality isointense infant brain MR image segmentation.

Anomaly detection aims to distinguish observations that are rare and different from the majority. While most existing algorithms assume that instances are i.i.d., in many practical scenarios, links describing instance-to-instance dependencies and interactions are available. Such systems are called attributed networks. Anomaly detection in attributed networks has various applications such as monitoring suspicious accounts in social media and financial fraud in transaction networks. However, it remains a challenging task since the definition of anomaly becomes more complicated and topological structures are heterogeneous with nodal attributes. In this paper, we propose a spectral convolution and deconvolution based framework -- SpecAE, to project the attributed network into a tailored space to detect global and community anomalies. SpecAE leverages Laplacian sharpening to amplify the distances between representations of anomalies and the ones of the majority. The learned representations along with reconstruction errors are combined with a density estimation model to perform the detection. They are trained jointly as an end-to-end framework. Experiments on real-world datasets demonstrate the effectiveness of SpecAE.

Graph neural networks (GNN) has been demonstrated to be effective in classifying graph structures. To further improve the graph representation learning ability, hierarchical GNN has been explored. It leverages the differentiable pooling to cluster nodes into fixed groups, and generates a coarse-grained structure accompanied with the shrinking of the original graph. However, such clustering would discard some graph information and achieve the suboptimal results. It is because the node inherently has different characteristics or roles, and two non-isomorphic graphs may have the same coarse-grained structure that cannot be distinguished after pooling. To compensate the loss caused by coarse-grained clustering and further advance GNN, we propose a multi-channel graph convolutional networks (MuchGCN). It is motivated by the convolutional neural networks, at which a series of channels are encoded to preserve the comprehensive characteristics of the input image. Thus, we define the specific graph convolutions to learn a series of graph channels at each layer, and pool graphs iteratively to encode the hierarchical structures. Experiments have been carefully carried out to demonstrate the superiority of MuchGCN over the state-of-the-art graph classification algorithms.

When applying the support vector machine (SVM) to high-dimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to perform automatic variable selection. In some problems, there is a natural hierarchical structure among the variables. Thus, in order to have an interpretable SVM classifier, it is important to respect the heredity principle when enforcing the sparsity in the SVM. Many variable selection methods, however, do not respect the heredity principle. In this paper we enforce both sparsity and the heredity principle in the SVM by using the so-called structured variable selection (SVS) framework originally proposed in Yuan, Joseph and Zou (2007). We minimize the empirical hinge loss under a set of linear inequality constraints and a lasso-type penalty. The solution always obeys the desired heredity principle and enjoys sparsity. The new SVM classifier can be efficiently fitted, because the optimization problem is a linear program. Another contribution of this work is to present a nonparametric extension of the SVS framework, and we propose nonparametric heredity SVMs. Simulated and real data are used to illustrate the merits of the proposed method.

In classification, the de facto method for aggregating individual losses is the average loss. When the actual metric of interest is 0-1 loss, it is common to minimize the average surrogate loss for some well-behaved (e.g. convex) surrogate. Recently, several other aggregate losses such as the maximal loss and average top-$k$ loss were proposed as alternative objectives to address shortcomings of the average loss. However, we identify common classification settings, e.g. the data is imbalanced, has too many easy or ambiguous examples, etc., when average, maximal and average top-$k$ all suffer from suboptimal decision boundaries, even on an infinitely large training set. To address this problem, we propose a new classification objective called the close-$k$ aggregate loss, where we adaptively minimize the loss for points close to the decision boundary. We provide theoretical guarantees for the 0-1 accuracy when we optimize close-$k$ aggregate loss. We also conduct systematic experiments across the PMLB and OpenML benchmark datasets. Close-$k$ achieves significant gains in 0-1 test accuracy, improvements of $\geq 2$% and $p<0.05$, in over 25% of the datasets compared to average, maximal and average top-$k$. In contrast, the previous aggregate losses outperformed close-$k$ in less than 2% of the datasets.

We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inception-v3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multi-arm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.

Target speech separation refers to extracting the target speaker's speech from mixed signals. Despite the recent advances in deep learning based close-talk speech separation, the applications to real-world are still an open issue. Two main challenges are the complex acoustic environment and the real-time processing requirement. To address these challenges, we propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture in reverberant environments, assisted with directional information of the speaker(s). Firstly, against variations brought by complex environment, the key idea is to increase the acoustic representation completeness through the jointly modeling of temporal, spectral and spatial discriminability between the target and interference source. Specifically, temporal, spectral, spatial along with the designed directional features are integrated to create a joint acoustic representation. Secondly, to reduce the latency, we design a fully-convolutional autoencoder framework, which is purely end-to-end and single-pass. All the feature computation is implemented by the network layers and operations to speed up the separation procedure. Evaluation is conducted on simulated reverberant dataset WSJ0-2mix and WSJ0-3mix under speaker-independent scenario. Experimental results demonstrate that the proposed method outperforms state-of-the-art deep learning based multi-channel approaches with fewer parameters and faster processing speed. Furthermore, the proposed temporal-spatial neural filter can handle mixtures with varying and unknown number of speakers and exhibits persistent performance even when existing a direction estimation error. Codes and models will be released soon.

Pseudo-Boolean monotone functions are unimodal functions which are trivial to optimize for some hillclimbers, but are challenging for a surprising number of evolutionary algorithms (EAs). A general trend is that EAs are efficient if parameters like the mutation rate are set conservatively, but may need exponential time otherwise. In particular, it was known that the $(1+1)$-EA and the $(1+\lambda)$-EA can optimize every monotone function in pseudolinear time if the mutation rate is $c/n$ for some $c<1$, but they need exponential time for some monotone functions for $c>2.2$. The second part of the statement was also known for the $(\mu+1)$-EA. In this paper we show that the first statement does not apply to the $(\mu+1)$-EA. More precisely, we prove that for every constant $c>0$ there is a constant integer $\mu_0$ such that the $(\mu+1)$-EA with mutation rate $c/n$ and population size $\mu_0\le\mu\le n$ needs superpolynomial time to optimize some monotone functions. Thus, increasing the population size by just a constant has devastating effects on the performance. This is in stark contrast to many other benchmark functions on which increasing the population size either increases the performance significantly, or affects performance mildly. The reason why larger populations are harmful lies in the fact that larger populations may temporarily decrease selective pressure on parts of the population. This allows unfavorable mutations to accumulate in single individuals and their descendants. If the population moves sufficiently fast through the search space, such unfavorable descendants can become ancestors of future generations, and the bad mutations are preserved. Remarkably, this effect only occurs if the population renews itself sufficiently fast, which can only happen far away from the optimum. This is counter-intuitive since usually optimization gets harder as we approach the optimum.

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size $n$ (e.g., $O(n^{24})$). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.

Meta learning is a promising solution to few-shot learning problems. However, existing meta learning methods are restricted to the scenarios where training and application tasks share the same out-put structure. To obtain a meta model applicable to the tasks with new structures, it is required to collect new training data and repeat the time-consuming meta training procedure. This makes them inefficient or even inapplicable in learning to solve heterogeneous few-shot learning tasks. We thus develop a novel and principled HierarchicalMeta Learning (HML) method. Different from existing methods that only focus on optimizing the adaptability of a meta model to similar tasks, HML also explicitly optimizes its generalizability across heterogeneous tasks. To this end, HML first factorizes a set of similar training tasks into heterogeneous ones and trains the meta model over them at two levels to maximize adaptation and generalization performance respectively. The resultant model can then directly generalize to new tasks. Extensive experiments on few-shot classification and regression problems clearly demonstrate the superiority of HML over fine-tuning and state-of-the-art meta learning approaches in terms of generalization across heterogeneous tasks.

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this work, we develop a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on $n$ data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley uniquely satisfies several natural properties of equitable data valuation. We develop Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical settings where complex learning algorithms, including neural networks, are trained on large datasets. In addition to being equitable, extensive experiments across biomedical, image and synthetic data demonstrate that data Shapley has several other benefits: 1) it is more powerful than the popular leave-one-out or leverage score in providing insight on what data is more valuable for a given learning task; 2) low Shapley value data effectively capture outliers and corruptions; 3) high Shapley value data inform what type of new data to acquire to improve the predictor.

Purpose: This paper presents an impedance control method with mixed $H_2/H_\infty$ synthesis and relaxed passivity for a cable-driven series elastic actuator to be applied for physical human-robot interaction. Design/methodology/approach: To shape the system's impedance to match a desired dynamic model, the impedance control problem was reformulated into an impedance matching structure. The desired competing performance requirements as well as constraints from the physical system can be characterized with weighting functions for respective signals. Considering the frequency properties of human movements, the passivity constraint for stable human-robot interaction, which is required on the entire frequency spectrum and may bring conservative solutions, has been relaxed in such a way that it only restrains the low frequency band. Thus, impedance control became a mixed $H_2/H_\infty$ synthesis problem, and a dynamic output feedback controller can be obtained. Findings: The proposed impedance control strategy has been tested for various desired impedance with both simulation and experiments on the cable-driven series elastic actuator platform. The actual interaction torque tracked well the desired torque within the desired norm bounds, and the control input was regulated below the motor velocity limit. The closed loop system can guarantee relaxed passivity at low frequency. Both simulation and experimental results have validated the feasibility and efficacy of the proposed method. Originality/value: This impedance control strategy with mixed $H_2/H_\infty$ synthesis and relaxed passivity provides a novel, effective and less conservative method for physical human-robot interaction control.

Variational autoencoders are powerful algorithms for identifying dominant latent structure in a single dataset. In many applications, however, we are interested in modeling latent structure and variation that are enriched in a target dataset compared to some background---e.g. enriched in patients compared to the general population. Contrastive learning is a principled framework to capture such enriched variation between the target and background, but state-of-the-art contrastive methods are limited to linear models. In this paper, we introduce the contrastive variational autoencoder (cVAE), which combines the benefits of contrastive learning with the power of deep generative models. The cVAE is designed to identify and enhance salient latent features. The cVAE is trained on two related but unpaired datasets, one of which has minimal contribution from the salient latent features. The cVAE explicitly models latent features that are shared between the datasets, as well as those that are enriched in one dataset relative to the other, which allows the algorithm to isolate and enhance the salient latent features. The algorithm is straightforward to implement, has a similar run-time to the standard VAE, and is robust to noise and dataset purity. We conduct experiments across diverse types of data, including gene expression and facial images, showing that the cVAE effectively uncovers latent structure that is salient in a particular analysis.

Measuring similarities between unlabeled time series trajectories is an important problem in domains as diverse as medicine, astronomy, finance, and computer vision. It is often unclear what is the appropriate metric to use because of the complex nature of noise in the trajectories (e.g. different sampling rates or outliers). Domain experts typically hand-craft or manually select a specific metric, such as dynamic time warping (DTW), to apply on their data. In this paper, we propose Autowarp, an end-to-end algorithm that optimizes and learns a good metric given unlabeled trajectories. We define a flexible and differentiable family of warping metrics, which encompasses common metrics such as DTW, Euclidean, and edit distance. Autowarp then leverages the representation power of sequence autoencoders to optimize for a member of this warping distance family. The output is a metric which is easy to interpret and can be robustly learned from relatively few trajectories. In systematic experiments across different domains, we show that Autowarp often outperforms hand-crafted trajectory similarity metrics.

Adaptive stochastic gradient descent methods, such as AdaGrad, RMSProp, Adam, AMSGrad, etc., have been demonstrated efficacious in solving non-convex stochastic optimization, such as training deep neural networks. However, their convergence rates have not been touched under the non-convex stochastic circumstance except recent breakthrough results on AdaGrad, perturbed AdaGrad and AMSGrad. In this paper, we propose two new adaptive stochastic gradient methods called AdaHB and AdaNAG which integrate a novel weighted coordinate-wise AdaGrad with heavy ball momentum and Nesterov accelerated gradient momentum, respectively. The $\mathcal{O}(\frac{\log{T}}{\sqrt{T}})$ non-asymptotic convergence rates of AdaHB and AdaNAG in non-convex stochastic setting are also jointly established by leveraging a newly developed unified formulation of these two momentum mechanisms. Moreover, comparisons have been made between AdaHB, AdaNAG, Adam and RMSProp, which, to a certain extent, explains the reasons why Adam and RMSProp are divergent. In particular, when momentum term vanishes we obtain convergence rate of coordinate-wise AdaGrad in non-convex stochastic setting as a byproduct.

Generative networks have made it possible to generate meaningful signals such as images and texts from simple noise. Recently, generative methods based on GAN and VAE were developed for graphs and graph signals. However, some of these methods are complex as well as difficult to train and fine-tune. This work proposes a graph generation model that uses a recent adaptation of Mallat's scattering transform to graphs. The proposed model is naturally composed of an encoder and a decoder. The encoder is a Gaussianized graph scattering transform. The decoder is a simple fully connected network that is adapted to specific tasks, such as link prediction, signal generation on graphs and full graph and signal generation. The training of our proposed system is efficient since it is only applied to the decoder and the hardware requirement is moderate. Numerical results demonstrate state-of-the-art performance of the proposed system for both link prediction and graph and signal generation. These results are in contrast to experience with Euclidean data, where it is difficult to form a generative scattering network that performs as well as state-of-the-art methods. We believe that this is because of the discrete and simpler nature of graph applications, unlike the more complex and high-frequency nature of Euclidean data, in particular, of some natural images.

In Positional-Slotted Object-Applicative (PSOA) RuleML, a predicate application (atom) can have an Object IDentifier (OID) and descriptors that may be positional arguments (tuples) or attribute-value pairs (slots). PSOA RuleML 1.0 specifies for each descriptor whether it is to be interpreted under the perspective of the predicate in whose scope it occurs. This perspectivity dimension refines the space between oidless, positional atoms (relationships) and oidful, slotted atoms (frames): While relationships use only a predicate-scope-sensitive (predicate-dependent) tuple and frames use only predicate-scope-insensitive (predicate-independent) slots, PSOA RuleML 1.0 uses a systematics of orthogonal constructs also permitting atoms with (predicate-)independent tuples and atoms with (predicate-)dependent slots. This supports data and knowledge representation where a slot attribute can have different values depending on the predicate. PSOA thus extends object-oriented multi-membership and multiple inheritance. Based on objectification, PSOA laws are given: Besides unscoping and centralization, the semantic restriction and transformation of describution permits rescoping of one atom's independent descriptors to another atom with the same OID but a different predicate. For inheritance, default descriptors are realized by rules. On top of a metamodel and a Grailog visualization, PSOA's atom systematics for facts, queries, and rules is explained. The presentation and (XML-)serialization syntaxes of PSOA RuleML 1.0 are introduced. Its model-theoretic semantics is formalized by extending the interpretation functions for dependent descriptors. The open PSOATransRun system since Version 1.3 realizes PSOA RuleML 1.0 by a translator to runtime predicates, including for dependent tuples (prdtupterm) and slots (prdsloterm). Our tests show efficiency advantages of dependent and tupled modeling.