Models, code, and papers for "Chao Liu":

Onto Word Segmentation of the Complete Tang Poems

Aug 28, 2019
Chao-Lin Liu

We aim at segmenting words in the Complete Tang Poems (CTP). Although it is possible to do some research about CTP without doing full-scale word segmentation, we must move forward to word-level analysis of CTP for conducting advanced research topics. In November 2018 when we submitted the manuscript for DH 2019 (ADHO), we collected only 2433 poems that were segmented by trained experts, and used the segmented poems to evaluate the segmenter that considered domain knowledge of Chinese poetry. We trained pointwise mutual information (PMI) between Chinese characters based on the CTP poems (excluding the 2433 poems, which were used exclusively only for testing) and the domain knowledge. The segmenter relied on the PMI information to the recover 85.7% of words in the test poems. We could segment a poem completely correct only 17.8% of the time, however. When we presented our work at DH 2019, we have annotated more than 20000 poems. With a much larger amount of data, we were able to apply biLSTM models for this word segmentation task, and we segmented a poem completely correct above 20% of the time. In contrast, human annotators completely agreed on their annotations about 40% of the time.

* 5 pages, 2 tables, presented at the 2019 International Conference on Digital Humanities (ADHO) 

  Click for Model/Code and Paper
Flexible Computing Services for Comparisons and Analyses of Classical Chinese Poetry

Sep 18, 2017
Chao-Lin Liu

We collect nine corpora of representative Chinese poetry for the time span of 1046 BCE and 1644 CE for studying the history of Chinese words, collocations, and patterns. By flexibly integrating our own tools, we are able to provide new perspectives for approaching our goals. We illustrate the ideas with two examples. The first example show a new way to compare word preferences of poets, and the second example demonstrates how we can utilize our corpora in historical studies of the Chinese words. We show the viability of the tools for academic research, and we wish to make it helpful for enriching existing Chinese dictionary as well.

* 6 pages, 2 tables, 1 figure, 2017 International Conference on Digital Humanities 

  Click for Model/Code and Paper
Quantitative Analyses of Chinese Poetry of Tang and Song Dynasties: Using Changing Colors and Innovative Terms as Examples

Aug 28, 2016
Chao-Lin Liu

Tang (618-907 AD) and Song (960-1279) dynasties are two very important periods in the development of Chinese literary. The most influential forms of the poetry in Tang and Song were Shi and Ci, respectively. Tang Shi and Song Ci established crucial foundations of the Chinese literature, and their influences in both literary works and daily lives of the Chinese communities last until today. We can analyze and compare the Complete Tang Shi and the Complete Song Ci from various viewpoints. In this presentation, we report our findings about the differences in their vocabularies. Interesting new words that started to appear in Song Ci and continue to be used in modern Chinese were identified. Colors are an important ingredient of the imagery in poetry, and we discuss the most frequent color words that appeared in Tang Shi and Song Ci.

* 2016 International Conference on Digital Humanities 

  Click for Model/Code and Paper
Inner Attention Supported Adaptive Cooperation for Heterogeneous Multi Robots Teaming based on Multi-agent Reinforcement Learning

Feb 12, 2020
Chao Huang, Rui Liu

Humans can selectively focus on different information based on different tasks requirements, other people's abilities and availability. Therefore, they can adapt quickly to a completely different and complex environments. If, like people, robot could obtain the same abilities, then it would greatly increase their adaptability to new and unexpected situations. Recent efforts in Heterogeneous Multi Robots Teaming have try to achieve this ability, such as the methods based on communication and multi-modal information fusion strategies. However, these methods will not only suffer from the exponential explosion problem with the increase of robots number but also need huge computational resources. To that end, we introduce an inner attention actor-critic method that replicates aspects of human flexibly cooperation. By bringing attention mechanism on computer vision, natural language process into the realm of multi-robot cooperation, our attention method is able to dynamically select which robots to attend to. In order to test the effectiveness of our proposed method, several simulation experiments have been designed. And the results show that inner attention mechanism can enable flexible cooperation and lower resources consuming in rescuing tasks.

* arXiv admin note: text overlap with arXiv:1911.01774 by other authors 

  Click for Model/Code and Paper
Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Aug 28, 2019
Chao-Lin Liu, Yi Chang

Tomb biographies of the Tang dynasty provide invaluable information about Chinese history. The original biographies are classical Chinese texts which contain neither word boundaries nor sentence boundaries. Relying on three published books of tomb biographies of the Tang dynasty, we investigated the effectiveness of employing machine-learning methods for algorithmically identifying the pauses and terminals of sentences in the biographies. We consider the segmentation task as a classification problem. Chinese characters that are and are not followed by a punctuation mark are classified into two categories. We applied a machine-learning-based mechanism, the conditional random fields (CRF), to classify the characters (and words) in the texts, and we studied the contributions of selected types of lexical information to the resulting quality of the segmentation recommendations. This proposal presented at the DH 2018 conference discussed some of the basic experiments and their evaluations. By considering the contextual information and employing the heuristics provided by experts of Chinese literature, we achieved F1 measures that were better than 80%. More complex experiments that employ deep neural networks helped us further improve the results in recent work.

* 6 pages, 3 figures, 2 tables, presented at the 2019 International Conference on Digital Humanities (ADHO) 

  Click for Model/Code and Paper
On Inertial Navigation and Attitude Initialization in Polar Areas

Mar 29, 2019
Yuanxin Wu, Chao He, Gang Liu

Inertial navigation and attitude initialization in polar areas become a hot topic in recent years in the navigation community, as the widely-used navigation mechanization of the local level frame encounters the inherent singularity when the latitude approaches 90 degrees. Great endeavors have been devoted to devising novel navigation mechanizations such as the grid or transversal frames. This paper highlights the fact that the common Earth-frame mechanization is sufficiently good to well handle the singularity problem in polar areas. Simulation results are reported to demonstrate the singularity problem and the effectiveness of the Earth-frame mechanization.

* 10 pages, 4 figures 

  Click for Model/Code and Paper
Matrix and Graph Operations for Relationship Inference: An Illustration with the Kinship Inference in the China Biographical Database

Sep 09, 2017
Chao-Lin Liu, Hongsu Wang

Biographical databases contain diverse information about individuals. Person names, birth information, career, friends, family and special achievements are some possible items in the record for an individual. The relationships between individuals, such as kinship and friendship, provide invaluable insights about hidden communities which are not directly recorded in databases. We show that some simple matrix and graph-based operations are effective for inferring relationships among individuals, and illustrate the main ideas with the China Biographical Database (CBDB).

* 3 pages, 3 figures, 2017 Annual Meeting of the Japanese Association for Digital Humanities 

  Click for Model/Code and Paper
TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings

Jun 18, 2012
Chao Liu, Yi-Min Wang

This paper revisits the problem of analyzing multiple ratings given by different judges. Different from previous work that focuses on distilling the true labels from noisy crowdsourcing ratings, we emphasize gaining diagnostic insights into our in-house well-trained judges. We generalize the well-known DawidSkene model (Dawid & Skene, 1979) to a spectrum of probabilistic models under the same "TrueLabel + Confusion" paradigm, and show that our proposed hierarchical Bayesian model, called HybridConfusion, consistently outperforms DawidSkene on both synthetic and real-world data sets.

* ICML2012 

  Click for Model/Code and Paper
NEW: A Generic Learning Model for Tie Strength Prediction in Networks

Jan 15, 2020
Zhen Liu, Hu li, Chao Wang

Tie strength prediction, sometimes named weight prediction, is vital in exploring the diversity of connectivity pattern emerged in networks. Due to the fundamental significance, it has drawn much attention in the field of network analysis and mining. Some related works appeared in recent years have significantly advanced our understanding of how to predict the strong and weak ties in the social networks. However, most of the proposed approaches are scenario-aware methods heavily depending on some special contexts and even exclusively used in social networks. As a result, they are less applicable to various kinds of networks. In contrast to the prior studies, here we propose a new computational framework called Neighborhood Estimating Weight (NEW) which is purely driven by the basic structure information of the network and has the flexibility for adapting to diverse types of networks. In NEW, we design a novel index, i.e., connection inclination, to generate the representative features of the network, which is capable of capturing the actual distribution of the tie strength. In order to obtain the optimized prediction results, we also propose a parameterized regression model which approximately has a linear time complexity and thus is readily extended to the implementation in large-scale networks. The experimental results on six real-world networks demonstrate that our proposed predictive model outperforms the state of the art methods, which is powerful for predicting the missing tie strengths when only a part of the network's tie strength information is available.


  Click for Model/Code and Paper
Reinforcement Learning in Healthcare: A Survey

Aug 22, 2019
Chao Yu, Jiming Liu, Shamim Nemati

As a subfield of machine learning, \emph{reinforcement learning} (RL) aims at empowering one's capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey will discuss the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research.


  Click for Model/Code and Paper
N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps

May 03, 2018
Yang Liu, Qiang Qu, Chao Gao

Considering the use of Fully Connected (FC) layer limits the performance of Convolutional Neural Networks (CNNs), this paper develops a method to improve the coupling between the convolution layer and the FC layer by reducing the noise in Feature Maps (FMs). Our approach is divided into three steps. Firstly, we separate all the FMs into n blocks equally. Then, the weighted summation of FMs at the same position in all blocks constitutes a new block of FMs. Finally, we replicate this new block into n copies and concatenate them as the input to the FC layer. This sharing of FMs could reduce the noise in them apparently and avert the impact by a particular FM on the specific part weight of hidden layers, hence preventing the network from overfitting to some extent. Using the Fermat Lemma, we prove that this method could make the global minima value range of the loss function wider, by which makes it easier for neural networks to converge and accelerates the convergence process. This method does not significantly increase the amounts of network parameters (only a few more coefficients added), and the experiments demonstrate that this method could increase the convergence speed and improve the classification performance of neural networks.

* 7 pages, 5 figures, submitted to ICALIP 2018 

  Click for Model/Code and Paper
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts

Apr 05, 2018
Siyou Liu, Longyue Wang, Chao-Hong Liu

Although there are increasing and significant ties between China and Portuguese-speaking countries, there is not much parallel corpora in the Chinese-Portuguese language pair. Both languages are very populous, with 1.2 billion native Chinese speakers and 279 million native Portuguese speakers, the language pair, however, could be considered as low-resource in terms of available parallel corpora. In this paper, we describe our methods to curate Chinese-Portuguese parallel corpora and evaluate their quality. We extracted bilingual data from Macao government websites and proposed a hierarchical strategy to build a large parallel corpus. Experiments are conducted on existing and our corpora using both Phrased-Based Machine Translation (PBMT) and the state-of-the-art Neural Machine Translation (NMT) models. The results of this work can be used as a benchmark for future Chinese-Portuguese MT systems. The approach we used in this paper also shows a good example on how to boost performance of MT systems for low-resource language pairs.

* accepted by LREC 2018 

  Click for Model/Code and Paper
Model Trees for Identifying Exceptional Players in the NHL Draft

Feb 23, 2018
Oliver Schulte, Yejia Liu, Chao Li

Drafting strong players is crucial for the team success. We describe a new data-driven interpretable approach for assessing draft prospects in the National Hockey League. Successful previous approaches have built a predictive model based on player features, or derived performance predictions from the observed performance of comparable players in a cohort. This paper develops model tree learning, which incorporates strengths of both model-based and cohort-based approaches. A model tree partitions the feature space according to the values of discrete features, or learned thresholds for continuous features. Each leaf node in the tree defines a group of players, easily described to hockey experts, with its own group regression model. Compared to a single model, the model tree forms an ensemble that increases predictive power. Compared to cohort-based approaches, the groups of comparables are discovered from the data, without requiring a similarity metric. The performance predictions of the model tree are competitive with the state-of-the-art methods, which validates our model empirically. We show in case studies that the model tree player ranking can be used to highlight strong and weak points of players.

* 14 pages 

  Click for Model/Code and Paper
Tracking Words in Chinese Poetry of Tang and Song Dynasties with the China Biographical Database

Oct 29, 2017
Chao-Lin Liu, Kuo-Feng Luo

Large-scale comparisons between the poetry of Tang and Song dynasties shed light on how words, collocations, and expressions were used and shared among the poets. That some words were used only in the Tang poetry and some only in the Song poetry could lead to interesting research in linguistics. That the most frequent colors are different in the Tang and Song poetry provides a trace of the changing social circumstances in the dynasties. Results of the current work link to research topics of lexicography, semantics, and social transitions. We discuss our findings and present our algorithms for efficient comparisons among the poems, which are crucial for completing billion times of comparisons within acceptable time.

* 9 pages, 3 figures, Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), 26th International Conference on Computational Linguistics (COLING) 

  Click for Model/Code and Paper
Exploring Lexical, Syntactic, and Semantic Features for Chinese Textual Entailment in NTCIR RITE Evaluation Tasks

Apr 08, 2015
Wei-Jie Huang, Chao-Lin Liu

We computed linguistic information at the lexical, syntactic, and semantic levels for Recognizing Inference in Text (RITE) tasks for both traditional and simplified Chinese in NTCIR-9 and NTCIR-10. Techniques for syntactic parsing, named-entity recognition, and near synonym recognition were employed, and features like counts of common words, statement lengths, negation words, and antonyms were considered to judge the entailment relationships of two statements, while we explored both heuristics-based functions and machine-learning approaches. The reported systems showed robustness by simultaneously achieving second positions in the binary-classification subtasks for both simplified and traditional Chinese in NTCIR-10 RITE-2. We conducted more experiments with the test data of NTCIR-9 RITE, with good results. We also extended our work to search for better configurations of our classifiers and investigated contributions of individual features. This extended work showed interesting results and should encourage further discussion.

* 20 pages, 1 figure, 26 tables, Journal article in Soft Computing (Spinger). Soft Computing, online. Springer, Germany, 2015 

  Click for Model/Code and Paper
State-space Abstraction for Anytime Evaluation of Probabilistic Networks

Feb 27, 2013
Michael P. Wellman, Chao-Lin Liu

One important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners.

* Appears in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI1994) 

  Click for Model/Code and Paper
Using Qualitative Relationships for Bounding Probability Distributions

Jan 30, 2013
Chao-Lin Liu, Michael P. Wellman

We exploit qualitative probabilistic relationships among variables for computing bounds of conditional probability distributions of interest in Bayesian networks. Using the signs of qualitative relationships, we can implement abstraction operations that are guaranteed to bound the distributions of interest in the desired direction. By evaluating incrementally improved approximate networks, our algorithm obtains monotonically tightening bounds that converge to exact distributions. For supermodular utility functions, the tightening bounds monotonically reduce the set of admissible decision alternatives as well.

* Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998) 

  Click for Model/Code and Paper
Incremental Tradeoff Resolution in Qualitative Probabilistic Networks

Jan 30, 2013
Chao-Lin Liu, Michael P. Wellman

Qualitative probabilistic reasoning in a Bayesian network often reveals tradeoffs: relationships that are ambiguous due to competing qualitative influences. We present two techniques that combine qualitative and numeric probabilistic reasoning to resolve such tradeoffs, inferring the qualitative relationship between nodes in a Bayesian network. The first approach incrementally marginalizes nodes that contribute to the ambiguous qualitative relationships. The second approach evaluates approximate Bayesian networks for bounds of probability distributions, and uses these bounds to determinate qualitative relationships in question. This approach is also incremental in that the algorithm refines the state spaces of random variables for tighter bounds until the qualitative relationships are resolved. Both approaches provide systematic methods for tradeoff resolution at potentially lower computational cost than application of purely numeric methods.

* Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998) 

  Click for Model/Code and Paper
Salient Instance Segmentation via Subitizing and Clustering

Sep 29, 2019
Jialun Pei, He Tang, Chao Liu, Chuanbo Chen

The goal of salient region detection is to identify the regions of an image that attract the most attention. Many methods have achieved state-of-the-art performance levels on this task. Recently, salient instance segmentation has become an even more challenging task than traditional salient region detection; however, few of the existing methods have concentrated on this underexplored problem. Unlike the existing methods, which usually employ object proposals to roughly count and locate object instances, our method applies salient objects subitizing to predict an accurate number of instances for salient instance segmentation. In this paper, we propose a multitask densely connected neural network (MDNN) to segment salient instances in an image. In contrast to existing approaches, our framework is proposal-free and category-independent. The MDNN contains two parallel branches: the first is a densely connected subitizing network (DSN) used for subitizing prediction; the second is a densely connected fully convolutional network (DFCN) used for salient region detection. The MDNN simultaneously outputs saliency maps and salient object subitizing. Then, an adaptive deep feature-based spectral clustering operation segments the salient regions into instances based on the subitizing and saliency maps. The experimental results on both salient region detection and salient instance segmentation datasets demonstrate the satisfactory performance of our framework. Notably, its APr@0.5 and Apr@0.7 reaches 73.46% and 60.14% in the salient instance dataset, substantially higher than the results achieved by the state-of-the-art algorithm.


  Click for Model/Code and Paper
Gradient Boost with Convolution Neural Network for Stock Forecast

Sep 19, 2019
Jialin Liu, Chih-Min Lin, Fei Chao

Market economy closely connects aspects to all walks of life. The stock forecast is one of task among studies on the market economy. However, information on markets economy contains a lot of noise and uncertainties, which lead economy forecasting to become a challenging task. Ensemble learning and deep learning are the most methods to solve the stock forecast task. In this paper, we present a model combining the advantages of two methods to forecast the change of stock price. The proposed method combines CNN and GBoost. The experimental results on six market indexes show that the proposed method has better performance against current popular methods.

* UKCL2019.11pages 

  Click for Model/Code and Paper