Models, code, and papers for "Liu Liu":

Recent research on discourse relations has found that they are cued not only by discourse markers (DMs) but also by other textual signals and that signaling information is indicative of genres. While several corpora exist with discourse relation signaling information such as the Penn Discourse Treebank (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018), they both annotate the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), which is limited to the news domain. Thus, this paper adapts the signal identification and anchoring scheme (Liu and Zeldes, 2019) to three more genres, examines the distribution of signaling devices across relations and genres, and provides a taxonomy of indicative signals found in this dataset.

Correlation matrices play a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson's sample correlation matrix. Although Pearson's sample correlation matrix enjoys various good properties under Gaussian models, it is not an effective estimator when facing heavy-tailed distributions. As a robust alternative, Han and Liu [J. Am. Stat. Assoc. 109 (2015) 275-287] advocated the use of a transformed version of the Kendall's tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall's tau sample correlation matrix and its transformed version proposed in Han and Liu [J. Am. Stat. Assoc. 109 (2015) 275-287] for estimating the population Kendall's tau correlation matrix and the latent Pearson's correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of "effective rank" in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a "sign sub-Gaussian condition" which is sufficient to guarantee that the rank-based correlation matrix estimator attains the fast rate of convergence. In both cases, we do not need any moment condition.

Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.

We present a new algorithm to train a robust neural network against adversarial attacks. Our algorithm is motivated by the following two ideas. First, although recent work has demonstrated that fusing randomness can improve the robustness of neural networks (Liu 2017), we noticed that adding noise blindly to all the layers is not the optimal way to incorporate randomness. Instead, we model randomness under the framework of Bayesian Neural Network (BNN) to formally learn the posterior distribution of models in a scalable way. Second, we formulate the mini-max problem in BNN to learn the best model distribution under adversarial attacks, leading to an adversarial-trained Bayesian neural net. Experiment results demonstrate that the proposed algorithm achieves state-of-the-art performance under strong attacks. On CIFAR-10 with VGG network, our model leads to 14\% accuracy improvement compared with adversarial training (Madry 2017) and random self-ensemble (Liu 2017) under PGD attack with $0.035$ distortion, and the gap becomes even larger on a subset of ImageNet.

Combinatorial optimization problems for clustering are known to be NP-hard. Most optimization methods are not able to find the global optimum solution for all datasets. To solve this problem, we propose a global optimal path-based clustering (GOPC) algorithm in this paper. The GOPC algorithm is based on two facts: (1) medoids have the minimum degree in their clusters; (2) the minimax distance between two objects in one cluster is smaller than the minimax distance between objects in different clusters. Extensive experiments are conducted on synthetic and real-world datasets to evaluate the performance of the GOPC algorithm. The results on synthetic datasets show that the GOPC algorithm can recognize all kinds of clusters regardless of their shapes, sizes, or densities. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the GOPC algorithm. In addition, the GOPC algorithm needs only one parameter, i.e., the number of clusters, which can be estimated by the decision graph. The advantages mentioned above make GOPC a good candidate as a general clustering algorithm. Codes are available at https://github.com/Qidong-Liu/Clustering.

This paper extends the deep material network (DMN) proposed by Liu et al. (2018) to tackle general 3-dimensional (3D) problems with arbitrary material and geometric nonlinearities. The global framework of DMN for mechanistic data-driven multiscale material modeling is discussed in detail on the offline training and online extrapolation stages. Analytical solutions of the 3D building block with a two-layer structure in both small- and finite-strain formulations are derived based on interfacial equilibrium conditions and kinematic constraints. With linear elastic data generated by direct numerical simulations on a representative volume element (RVE), the network can be effectively trained in offline stage using stochastic gradient descent and advanced model compression algorithms. Efficiency and accuracy of DMN on addressing the long-standing 3D RVE challenges with complex morphologies and material laws are validated through numerical experiments, including 1) hyperelastic particle-reinforced rubber composite with Mullins effect; 2) polycrystalline materials with rate-dependent crystal plasticity; 3) carbon fiber reinforced polymer (CFRP) composites with fiber anisotropic elasticity and matrix plasticity. In particular, we demonstrate a three-scale homogenization procedure of CFRP system by concatenating the microscale and mesoscale material networks. The complete learning and extrapolation procedures of DMN establish a reliable data-driven framework for multiscale material modeling and design.

Shapley value is a concept in cooperative game theory for measuring the contribution of each participant, which was named in honor of Lloyd Shapley. Shapley value has been recently applied in data marketplaces for compensation allocation based on their contribution to the models. Shapley value is the only value division scheme used for compensation allocation that meets three desirable criteria: group rationality, fairness, and additivity. In cooperative game theory, the marginal contribution of each contributor to each coalition is a nonnegative value. However, in machine learning model training, the marginal contribution of each contributor (data tuple) to each coalition (a set of data tuples) can be a negative value, i.e., the accuracy of the model trained by a dataset with an additional data tuple can be lower than the accuracy of the model trained by the dataset only. In this paper, we investigate the problem of how to handle the negative marginal contribution when computing Shapley value. We explore three philosophies: 1) taking the original value (Original Shapley Value); 2) taking the larger of the original value and zero (Zero Shapley Value); and 3) taking the absolute value of the original value (Absolute Shapley Value). Experiments on Iris dataset demonstrate that the definition of Absolute Shapley Value significantly outperforms the other two definitions in terms of evaluating data importance (the contribution of each data tuple to the trained model).

Geometry and topology of decision regions are closely related with classification performance and robustness against adversarial attacks. In this paper, we use differential geometry and topology to explore theoretically the geometrical and topological properties of decision regions produced by deep neural networks (DNNs). The goals are to obtain some geometrical and topological properties of decision regions for given DNN models, and provide some principled guidances to designing and regularizing DNNs. At first, we give the curvatures of decision boundaries in terms of network weights. Based on the rotation index theorem and Gauss-Bonnet-Chern theorem, we then propose methods to identify the closeness and connectivity of given decision boundaries, and obtain the Euler characteristics of closed ones, all without the need to solve decision boundaries explicitly. Finally, we give necessary conditions on network architectures in order to produce closed decision boundaries, and sufficient conditions on network weights for producing zero curvature (flat or developable) decision boundaries.

For one-hidden-layer ReLU networks, we show that all local minima are global in each differentiable region, and these local minima can be unique or continuous, depending on data, activation pattern of hidden neurons and network size. We give criteria to identify whether local minima lie inside their defining regions, and if so (we call them genuine differentiable local minima), their locations and loss values. Furthermore, we give necessary and sufficient conditions for the existence of saddle points as well as non-differentiable local minima. Finally, we compute the probability of getting stuck in genuine local minima for Gaussian input data and parallel weight vectors, and show that it is exponentially vanishing when the weights are located in regions where data are not too scarce. This may give a hint to the question why gradient-based local search methods usually do not get trapped in local minima when training deep ReLU neural networks.

Constructing neural networks for function approximation is a classical and longstanding topic in approximation theory. In this paper, we aim at constructing deep neural networks (deep nets for short) with three hidden layers to approximate smooth and sparse functions. In particular, we prove that the constructed deep nets can reach the optimal approximation rate in approximating both smooth and sparse functions with controllable magnitude of free parameters. Since the saturation that describes the bottleneck of approximate is an insurmountable problem of constructive neural networks, we also prove that deepening the neural network with only one more hidden layer can avoid the saturation. The obtained results underlie advantages of deep nets and provide theoretical explanations for deep learning.

A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence intervals can be utilized to tell whether this improvement is real or only due to random chance. The bootstrap resampling method has been popular for such significance analysis which is intuitive and easy to use. However, this method fails in dealing with dependent data, which is prevalent in speech world - for example, ASR performance on utterances from the same speaker could be correlated. In this paper we present blockwise bootstrap approach - by dividing evaluation utterances into nonoverlapping blocks, this method resamples these blocks instead of original data. We show that the resulting variance estimator of absolute WER difference of two ASR systems is consistent under mild conditions. We also demonstrate the validity of blockwise bootstrap method on both synthetic and real-world speech data.

Microservices have been dominating in the modern cloud environment. To improve cost efficiency, multiple microservices are normally co-located on a server. Thus, the run-time resource scheduling becomes the pivot for QoS control. However, the scheduling exploration space enlarges rapidly with the increasing server resources - cores, cache, bandwidth, etc. - and the diversity of microservices. Consequently, the existing schedulers might not meet the rapid changes in service demands. Besides, we observe that there exist resource cliffs in the scheduling space. It not only impacts the exploration efficiency, making it difficult to converge to the optimal scheduling solution, but also results in severe QoS fluctuation. To overcome these problems, we propose a novel machine learning-based scheduling mechanism called OSML. It uses resources and runtime states as the input and employs two MLP models and a reinforcement learning model to perform scheduling space exploration. Thus, OSML can reach an optimal solution much faster than traditional approaches. More importantly, it can automatically detect the resource cliff and avoid them during exploration. To verify the effectiveness of OSML and obtain a well-generalized model, we collect a dataset containing over 2-billion samples from 11 typical microservices running on real servers over 9 months. Under the same QoS constraint, experimental results show that OSML outperforms the state-of-the-art work, and achieves around 5 times scheduling speed.

The morphological attributes of retinal vessels, such as length, width, tortuosity and branching pattern and angles, play an important role in diagnosis, screening, treatment, and evaluation of various cardiovascular and ophthalmologic diseases such as diabetes, hypertension and arteriosclerosis. The crucial step before extracting these morphological characteristics of retinal vessels from retinal fundus images is vessel segmentation. In this work, we propose a method for retinal vessel segmentation based on fully convolutional networks. Thousands of patches are extracted from each retinal image and then fed into the network, and data argumentation is applied by rotating extracted patches. Two architectures of fully convolutional networks, U-Net and LadderNet, are used for vessel segmentation. The performance of our method is evaluated on three public datasets: DRIVE, STARE, and CHASE\_DB1. Experimental results of our method show superior performance compared to recent state-of-the-art methods.

A growing number of empirical studies suggest that negative advertising is effective in campaigning, while the mechanisms are rarely mentioned. With the scandal of Cambridge Analytica and Russian intervention behind the Brexit and the 2016 presidential election, people have become aware of the political ads on social media and have pressured congress to restrict political advertising on social media. Following the related legislation, social media companies began disclosing their political ads archive for transparency during the summer of 2018 when the midterm election campaign was just beginning. This research collects the data of the related political ads in the context of the U.S. midterm elections since August to study the overall pattern of political ads on social media and uses sets of machine learning methods to conduct sentiment analysis on these ads to classify the negative ads. A novel approach is applied that uses AI image recognition to study the image data. Through data visualization, this research shows that negative advertising is still the minority, Republican advertisers and third party organizations are more likely to engage in negative advertising than their counterparts. Based on ordinal regressions, this study finds that anger evoked information-seeking is one of the main mechanisms causing negative ads to be more engaging and effective rather than the negative bias theory. Overall, this study provides a unique understanding of political advertising on social media by applying innovative data science methods. Further studies can extend the findings, methods, and datasets in this study, and several suggestions are given for future research.

A fundamental issue in multiscale materials modeling and design is the consideration of traction-separation behavior at the interface, which generally influences the failure properties. This paper develops a physics-based machine learning model based on the deep material network (DMN) enriched by cohesive layers, which enables the accurate and efficient prediction of multiscale responses for heterogeneous materials with interfacial effect. New fitting parameters are invoked in the cohesive building block and have physical meanings related to the length scale and orientation of the cohesive layer. It is shown that the enriched material network can be effectively optimized via a multi-stage training strategy, with training data generated from linear elastic direct numerical simulation (DNS). The extrapolation capability of the method to unknown spaces is demonstrated through the debonding analysis of a unidirectional fiber-reinforced composite, where the interface behavior is governed by an irreversible softening mixed-mode cohesive law. Its predictive accuracy is validated against the nonlinear DNS results, and the reduction in computational time is particularly significant.

In recent years, deep neural networks have found success in replicating human-level cognitive skills, yet they suffer from several major obstacles. One significant limitation is the inability to learn new tasks without forgetting previously learned tasks, a shortcoming known as catastrophic forgetting. In this research, we propose a simple method to overcome catastrophic forgetting and enable continual learning in neural networks. We draw inspiration from principles in neurology and physics to develop the concept of weight friction. Weight friction operates by a modification to the update rule in the gradient descent optimization method. It converges at a rate comparable to that of the stochastic gradient descent algorithm and can operate over multiple task domains. It performs comparably to current methods while offering improvements in computation and memory efficiency.

Neuromorphic Computing is a nascent research field in which models and devices are designed to process information by emulating biological neural systems. Thanks to their superior energy efficiency, analog neuromorphic systems are highly promising for embedded, wearable, and implantable systems. However, optimizing neural networks deployed on these systems is challenging. One main challenge is the so-called timescale mismatch: Dynamics of analog circuits tend to be too fast to process real-time sensory inputs. In this thesis, we propose a few working solutions to slow down dynamics of on-chip spiking neural networks. We empirically show that, by harnessing slow dynamics, spiking neural networks on analog neuromorphic systems can gain non-trivial performance boosts on a battery of real-time signal processing tasks.

Support vector machine (SVM) is one of the most widely used classification methods. In this paper, we consider soft margin support vector machine used on data points with independent features, where the sample size $n$ and the feature dimension $p$ grows to $\infty$ in a fixed ratio $p/n\rightarrow \delta$. We propose a set of equations that exactly characterizes the asymptotic behavior of support vector machine. In particular, we give exact formula for (1) the variability of the optimal coefficients, (2) proportion of data points lying on the margin boundary (i.e. number of support vectors), (3) the final objective function value, and (4) expected misclassification error on new data points, which in particular implies exact formula for the optimal tuning parameter given a data generating mechanism. The global null case is considered first, where the label $y\in\{+1,-1\}$ is independent of the feature $x$. Then the signaled case is considered, where the label $y\in\{+1,-1\}$ is allowed to have a general dependence on the feature $x$ through a linear combination $a_0^Tx$. These results for the non-smooth hinge loss serve as an analogue to the recent results in \citet{sur2018modern} for smooth logistic loss. Our approach is based on heuristic leave-one-out calculations.

We present our 7th place solution to the Gendered Pronoun Resolution challenge, which uses BERT without fine-tuning and a novel augmentation strategy designed for contextual embedding token-level tasks. Our method anonymizes the referent by replacing candidate names with a set of common placeholder names. Besides the usual benefits of effectively increasing training data size, this approach diversifies idiosyncratic information embedded in names. Using same set of common first names can also help the model recognize names better, shorten token length, and remove gender and regional biases associated with names. The system scored 0.1947 log loss in stage 2, where the augmentation contributed to an improvements of 0.04. Post-competition analysis shows that, when using different embedding layers, the system scores 0.1799 which would be third place.

Applying neural-networks on Question Answering has gained increasing popularity in recent years. In this paper, I implemented a model with Bi-directional attention flow layer, connected with a Multi-layer LSTM encoder, connected with one start-index decoder and one conditioning end-index decoder. I introduce a new end-index decoder layer, conditioning on start-index output. The Experiment shows this has increased model performance by 15.16%. For prediction, I proposed a new smart-span equation, rewarding both short answer length and high probability in start-index and end-index, which further improved the prediction accuracy. The best single model achieves an F1 score of 73.97% and EM score of 64.95% on test set.