Research papers and code for "L. Li":
This article proposes a method for mathematical modeling of human movements related to patient exercise episodes performed during physical therapy sessions by using artificial neural networks. The generative adversarial network structure is adopted, whereby a discriminative and a generative model are trained concurrently in an adversarial manner. Different network architectures are examined, with the discriminative and generative models structured as deep subnetworks of hidden layers comprised of convolutional or recurrent computational units. The models are validated on a data set of human movements recorded with an optical motion tracker. The results demonstrate an ability of the networks for classification of new instances of motions, and for generation of motion examples that resemble the recorded motion sequences.

* Int. J. Machine Learning Computing 8 (2018) 428-436
* 11 pages, 6 figures
Click to Read Paper and Get Code
Sparse connectivity is an important factor behind the success of convolutional neural networks and recurrent neural networks. In this paper, we consider the problem of learning sparse connectivity for feedforward neural networks (FNNs). The key idea is that a unit should be connected to a small number of units at the next level below that are strongly correlated. We use Chow-Liu's algorithm to learn a tree-structured probabilistic model for the units at the current level, use the tree to identify subsets of units that are strongly correlated, and introduce a new unit with receptive field over the subsets. The procedure is repeated on the new units to build multiple layers of hidden units. The resulting model is called a TRF-net. Empirical results show that, when compared to dense FNNs, TRF-net achieves better or comparable classification performance with much fewer parameters and sparser structures. They are also more interpretable.

* International Joint Conference on Artificial Intelligence 2018
Click to Read Paper and Get Code
Despite the popularity of deep learning, structure learning for deep models remains a relatively under-explored area. In contrast, structure learning has been studied extensively for probabilistic graphical models (PGMs). In particular, an efficient algorithm has been developed for learning a class of tree-structured PGMs called hierarchical latent tree models (HLTMs), where there is a layer of observed variables at the bottom and multiple layers of latent variables on top. In this paper, we propose a simple method for learning the structures of feedforward neural networks (FNNs) based on HLTMs. The idea is to expand the connections in the tree skeletons from HLTMs and to use the resulting structures for FNNs. An important characteristic of FNN structures learned this way is that they are sparse. We present extensive empirical results to show that, compared with standard FNNs tuned-manually, sparse FNNs learned by our method achieve better or comparable classification performance with much fewer parameters. They are also more interpretable.

* 7 pages
Click to Read Paper and Get Code
Recently, deep learning based clustering methods are shown superior to traditional ones by jointly conducting representation learning and clustering. These methods rely on the assumptions that the number of clusters is known, and that there is one single partition over the data and all attributes define that partition. However, in real-world applications, prior knowledge of the number of clusters is usually unavailable and there are multiple ways to partition the data based on subsets of attributes. To resolve the issues, we propose latent tree variational autoencoder (LTVAE), which simultaneously performs representation learning and multidimensional clustering. LTVAE learns latent embeddings from data, discovers multi-facet clustering structures based on subsets of latent features, and automatically determines the number of clusters in each facet. Experiments show that the proposed method achieves state-of-the-art clustering performance and reals reasonable multifacet structures of the data.

* 9 pages
Click to Read Paper and Get Code
In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of Markov Decision Processes (MDP), to which we refers as Quantile Markov Decision Processes (QMDP). Traditionally, the goal of a Markov Decision Process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly to be infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. Our framework of QMDP provides analytical results characterizing the optimal QMDP solution and presents the algorithm for solving the QMDP. We provide analytical results characterizing the optimal QMDP solution and present the algorithms for solving the QMDP. We illustrate the model with two experiments: a grid game and a HIV optimal treatment experiment.

Click to Read Paper and Get Code
The focus in this paper is Bayesian system identification based on noisy incomplete modal data where we can impose spatially-sparse stiffness changes when updating a structural model. To this end, based on a similar hierarchical sparse Bayesian learning model from our previous work, we propose two Gibbs sampling algorithms. The algorithms differ in their strategies to deal with the posterior uncertainty of the equation-error precision parameter, but both sample from the conditional posterior probability density functions (PDFs) for the structural stiffness parameters and system modal parameters. The effective dimension for the Gibbs sampling is low because iterative sampling is done from only three conditional posterior PDFs that correspond to three parameter groups, along with sampling of the equation-error precision parameter from another conditional posterior PDF in one of the algorithms where it is not integrated out as a "nuisance" parameter. A nice feature from a computational perspective is that it is not necessary to solve a nonlinear eigenvalue problem of a structural model. The effectiveness and robustness of the proposed algorithms are illustrated by applying them to the IASE-ASCE Phase II simulated and experimental benchmark studies. The goal is to use incomplete modal data identified before and after possible damage to detect and assess spatially-sparse stiffness reductions induced by any damage. Our past and current focus on meeting challenges arising from Bayesian inference of structural stiffness serve to strengthen the capability of vibration-based structural system identification but our methods also have much broader applicability for inverse problems in science and technology where system matrices are to be inferred from noisy partial information about their eigenquantities.

* 12 figures
Click to Read Paper and Get Code
Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorithms, RMAX and MBIE (model-based interval estimation), resulting in computationally much faster algorithms with little loss compared to existing bounds. Specifically, our two new learning algorithms, RTDP-RMAX and RTDP-IE, have considerably smaller computational demands than RMAX and MBIE. We develop a general theoretical framework that allows us to prove that both are efficient learners in a PAC (probably approximately correct) sense. We also present an experimental evaluation of these new algorithms that helps quantify the tradeoff between computational and experience demands.

* Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)
Click to Read Paper and Get Code
We present a partial-order, conformant, probabilistic planner, Probapop which competed in the blind track of the Probabilistic Planning Competition in IPC-4. We explain how we adapt distance based heuristics for use with probabilistic domains. Probapop also incorporates heuristics based on probability of success. We explain the successes and difficulties encountered during the design and implementation of Probapop.

* Journal Of Artificial Intelligence Research, Volume 25, pages 1-15, 2006
Click to Read Paper and Get Code
The solution of nonlinear electromagnetic (EM) inverse scattering problems is typically hindered by several challenges such as ill-posedness, strong nonlinearity, and high computational costs. Recently, deep learning has been demonstrated to be a promising tool in addressing these challenges. In particular, it is possible to establish a connection between a deep convolutional neural network (CNN) and iterative solution methods of nonlinear EM inverse scattering. This has led to the development of an efficient CNN-based solution to nonlinear EM inverse problems, termed DeepNIS. It has been shown that DeepNIS can outperform conventional nonlinear inverse scattering methods in terms of both image quality and computational time. In this work, we quantitatively evaluate the performance of DeepNIS as a function of the number of layers using structure similarity measure (SSIM) and mean-square error (MSE) metrics. In addition, we probe the dynamic evolution behavior of DeepNIS by examining its near-isometry property. It is shown that after a proper training stage the proposed CNN is near optimal in terms of the stability and generalization ability.

* 1 pages,4 figures
Click to Read Paper and Get Code
Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training. Otherwise, the networks may fail to find a good local optimum. This is particularly true for online networks, such as unidirectional LSTMs. Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system. However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon. In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources. We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements. We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19 % relative improvement in word error rate compared to a randomly-initialized baseline system.

* Interspeech 2018
Click to Read Paper and Get Code
We address the problems of measuring geometric similarity between 3D scenes, represented through point clouds or range data frames, and associating them. Our approach leverages macro-scale 3D structural geometry - the relative configuration of arbitrary surfaces and relationships among structures that are potentially far apart. We express such discriminative information in a viewpoint-invariant feature space. These are subsequently encoded in a frame-level signature that can be utilized to measure geometric similarity. Such a characterization is robust to noise, incomplete and partially overlapping data besides viewpoint changes. We show how it can be employed to select a diverse set of data frames which have structurally similar content, and how to validate whether views with similar geometric content are from the same scene. The problem is formulated as one of general purpose retrieval from an unannotated, spatio-temporally unordered database. Empirical analysis indicates that the presented approach thoroughly outperforms baselines on depth / range data. Its depth-only performance is competitive with state-of-the-art approaches with RGB or RGB-D inputs, including ones based on deep learning. Experiments show retrieval performance to hold up well with much sparser databases, which is indicative of the approach's robustness. The approach generalized well - it did not require dataset specific training, and scaled up in our experiments. Finally, we also demonstrate how geometrically diverse selection of views can result in richer 3D reconstructions.

* Accepted in ICRA '18
Click to Read Paper and Get Code
The application of compressive sensing (CS) to structural health monitoring is an emerging research topic. The basic idea in CS is to use a specially-designed wireless sensor to sample signals that are sparse in some basis (e.g. wavelet basis) directly in a compressed form, and then to reconstruct (decompress) these signals accurately using some inversion algorithm after transmission to a central processing unit. However, most signals in structural health monitoring are only approximately sparse, i.e. only a relatively small number of the signal coefficients in some basis are significant, but the other coefficients are usually not exactly zero. In this case, perfect reconstruction from compressed measurements is not expected. A new Bayesian CS algorithm is proposed in which robust treatment of the uncertain parameters is explored, including integration over the prediction-error precision parameter to remove it as a "nuisance" parameter. The performance of the new CS algorithm is investigated using compressed data from accelerometers installed on a space-frame structure and on a cable-stayed bridge. Compared with other state-of-the-art CS methods including our previously-published Bayesian method which uses MAP (maximum a posteriori) estimation of the prediction-error precision parameter, the new algorithm shows superior performance in reconstruction robustness and posterior uncertainty quantification. Furthermore, our method can be utilized for recovery of lost data during wireless transmission, regardless of the level of sparseness in the signal.

* 41 pages, 18 figures
Click to Read Paper and Get Code
We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN). We achieve state-of-the-art forecasting accuracy on the benchmark CDC dataset. To improve model efficiency, we sparsify the network weights via transformed-$\ell_1$ penalty and maintain prediction accuracy at the same level with 70% of the network weights being zero.

Click to Read Paper and Get Code
Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and interact with objects it sees, we propose a "world-model" network that learns to predict the dynamic consequences of the agent's actions. Simultaneously, we train a separate explicit "self-model" that allows the agent to track the error map of its own world-model, and then uses the self-model to adversarially challenge the developing world-model. We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in complex novel physical environments.

* In NIPS 2018. 10 pages, 5 figures
Click to Read Paper and Get Code
Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to replicate some of these abilities with a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which the agent can move and interact with objects it sees, the agent learns a world model predicting the dynamic consequences of its actions. Simultaneously, the agent learns to take actions that adversarially challenge the developing world model, pushing the agent to explore novel and informative interactions with its environment. We demonstrate that this policy leads to the self-supervised emergence of a spectrum of complex behaviors, including ego motion prediction, object attention, and object gathering. Moreover, the world model that the agent learns supports improved performance on object dynamics prediction and localization tasks. Our results are a proof-of-principle that computational models of intrinsic motivation might account for key features of developmental visuomotor learning in infants.

* 6 pages, 5 figures
Click to Read Paper and Get Code
Eradicating hunger and malnutrition is a key development goal of the 21st century. We address the problem of optimally identifying seed varieties to reliably increase crop yield within a risk-sensitive decision-making framework. Specifically, we introduce a novel hierarchical machine learning mechanism for predicting crop yield (the yield of different seed varieties of the same crop). We integrate this prediction mechanism with a weather forecasting model, and propose three different approaches for decision making under uncertainty to select seed varieties for planting so as to balance yield maximization and risk.We apply our model to the problem of soybean variety selection given in the 2016 Syngenta Crop Challenge. Our prediction model achieves a median absolute error of 3.74 bushels per acre and thus provides good estimates for input into the decision models.Our decision models identify the selection of soybean varieties that appropriately balance yield and risk as a function of the farmer's risk aversion level. More generally, our models support farmers in decision making about which seed varieties to plant.

Click to Read Paper and Get Code
High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-trained model and the desired target domain. To perform adaptation, we employ teacher/student (T/S) learning, in which the posterior probabilities generated by the source-domain model can be used in lieu of labels to train the target-domain model. We evaluate the proposed approach in two scenarios, adapting a clean acoustic model to noisy speech and adapting an adults speech acoustic model to children speech. Significant improvements in accuracy are obtained, with reductions in word error rate of up to 44% over the original source model without the need for transcribed data in the target domain. Moreover, we show that increasing the amount of unlabeled data results in additional model robustness, which is particularly beneficial when using simulated training data in the target-domain.

Click to Read Paper and Get Code
Continuous state spaces and stochastic, switching dynamics characterize a number of rich, realworld domains, such as robot navigation across varying terrain. We describe a reinforcementlearning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that scales polynomially with the state-space dimension. Unfortunately, no optimal planning techniques exist in general for such problems; instead we use fitted value iteration to solve the learned MDP, and include the error due to approximate planning in our bounds. Finally, we report an experiment using a robotic car driving over varying terrain to demonstrate that these dynamics representations adequately capture real-world dynamics and that our algorithm can be used to efficiently solve such problems.

* Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)
Click to Read Paper and Get Code
We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorithm achieves nearoptimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning. We demonstrate that BOSS performs quite favorably compared to state-of-the-art reinforcement-learning approaches and illustrate its flexibility by pairing it with a non-parametric model that generalizes across states.

* Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)
Click to Read Paper and Get Code
There is often latent network structure in spatial and temporal data and the tools of network analysis can yield fascinating insights into such data. In this paper, we develop a nonparametric method for network reconstruction from spatiotemporal data sets using multivariate Hawkes processes. In contrast to prior work on network reconstruction with point-process models, which has often focused on exclusively temporal information, our approach uses both temporal and spatial information and does not assume a specific parametric form of network dynamics. This leads to an effective way of recovering an underlying network. We illustrate our approach using both synthetic networks and networks constructed from real-world data sets (a location-based social media network, a narrative of crime events, and violent gang crimes). Our results demonstrate that, in comparison to using only temporal data, our spatiotemporal approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis --- such as community structure and motif analysis --- of the reconstructed networks.

Click to Read Paper and Get Code