Pectoral muscle identification is often required for breast cancer risk analysis, such as estimating breast density. Traditional methods are overwhelmingly based on manual visual assessment or straight line fitting for the pectoral muscle boundary, which are inefficient and inaccurate since pectoral muscle in mammograms can have curved boundaries. This paper proposes a novel and automatic pectoral muscle identification algorithm for MLO view mammograms. It is suitable for both scanned film and full field digital mammograms. This algorithm is demonstrated using a public domain software ImageJ. A validation of this algorithm has been performed using real-world data and it shows promising result.

* 11 pages, 6 figures
Click to Read Paper
Automatic head frontal-view identification is challenging due to appearance variations caused by pose changes, especially without any training samples. In this paper, we present an unsupervised algorithm for identifying frontal view among multiple facial images under various yaw poses (derived from the same person). Our approach is based on Locally Linear Embedding (LLE), with the assumption that with yaw pose being the only variable, the facial images should lie in a smooth and low dimensional manifold. We horizontally flip the facial images and present two K-nearest neighbor protocols for the original images and the flipped images, respectively. In the proposed extended LLE, for any facial image (original or flipped one), we search (1) the Ko nearest neighbors among the original facial images and (2) the Kf nearest neighbors among the flipped facial images to construct the same neighborhood graph. The extended LLE eliminates the differences (because of background, face position and scale in the whole image and some asymmetry of left-right face) between the original facial image and the flipped facial image at the same yaw pose so that the flipped facial images can be used effectively. Our approach does not need any training samples as prior information. The experimental results show that the frontal view of head can be identified reliably around the lowest point of the pose manifold for multiple facial images, especially the cropped facial images (little background and centered face).

Click to Read Paper
We describe a class of systems theory based neural networks called "Network Of Recurrent neural networks" (NOR), which introduces a new structure level to RNN related models. In NOR, RNNs are viewed as the high-level neurons and are used to build the high-level layers. More specifically, we propose several methodologies to design different NOR topologies according to the theory of system evolution. Then we carry experiments on three different tasks to evaluate our implementations. Experimental results show our models outperform simple RNN remarkably under the same number of parameters, and sometimes achieve even better results than GRU and LSTM.

* Under review as a conference paper at AAAI 2018
Click to Read Paper
To apply general knowledge to machine reading comprehension (MRC), we propose an innovative MRC approach, which consists of a WordNet-based data enrichment method and an MRC model named as Knowledge Aided Reader (KAR). The data enrichment method uses the semantic relations of WordNet to extract semantic level inter-word connections from each passage-question pair in the MRC dataset, and allows us to control the amount of the extraction results by setting a hyper-parameter. KAR uses the extraction results of the data enrichment method as explicit knowledge to assist the prediction of answer spans. According to the experimental results, the single model of KAR achieves an Exact Match (EM) of $72.4$ and an F1 Score of $81.1$ on the development set of SQuAD, and more importantly, by applying different settings in the data enrichment method to change the amount of the extraction results, there is a $2\%$ variation in the resulting performance of KAR, which implies that the explicit knowledge provided by the data enrichment method plays an effective role in the training of KAR.

Click to Read Paper
As a generative model for building end-to-end dialogue systems, Hierarchical Recurrent Encoder-Decoder (HRED) consists of three layers of Gated Recurrent Unit (GRU), which from bottom to top are separately used as the word-level encoder, the sentence-level encoder, and the decoder. Despite performing well on dialogue corpora, HRED is computationally expensive to train due to its complexity. To improve the training efficiency of HRED, we propose a new model, which is named as Simplified HRED (SHRED), by making each layer of HRED except the top one simpler than its upper layer. On the one hand, we propose Scalar Gated Unit (SGU), which is a simplified variant of GRU, and use it as the sentence-level encoder. On the other hand, we use Fixed-size Ordinally-Forgetting Encoding (FOFE), which has no trainable parameter at all, as the word-level encoder. The experimental results show that compared with HRED under the same word embedding size and the same hidden state size for each layer, SHRED reduces the number of trainable parameters by 25\%--35\%, and the training time by more than 50\%, but still achieves slightly better performance.

Click to Read Paper
This paper addresses the nearest neighbor search problem under inner product similarity and introduces a compact code-based approach. The idea is to approximate a vector using the composition of several elements selected from a source dictionary and to represent this vector by a short code composed of the indices of the selected elements. The inner product between a query vector and a database vector is efficiently estimated from the query vector and the short code of the database vector. We show the superior performance of the proposed group $M$-selection algorithm that selects $M$ elements from $M$ source dictionaries for vector approximation in terms of search accuracy and efficiency for compact codes of the same length via theoretical and empirical analysis. Experimental results on large-scale datasets ($1M$ and $1B$ SIFT features, $1M$ linear models and Netflix) demonstrate the superiority of the proposed approach.

* The approach presented in this paper (ECCV14 submission) is closely related to multi-stage vector quantization and residual quantization. Thanks the reviewers (CVPR14 and ECCV14) for pointing out the relationship to the two algorithms. Related paper: http://sites.skoltech.ru/app/data/uploads/sites/2/2013/09/CVPR14.pdf, which also adopts the summation of vectors for vector approximation
Click to Read Paper
Biographical databases contain diverse information about individuals. Person names, birth information, career, friends, family and special achievements are some possible items in the record for an individual. The relationships between individuals, such as kinship and friendship, provide invaluable insights about hidden communities which are not directly recorded in databases. We show that some simple matrix and graph-based operations are effective for inferring relationships among individuals, and illustrate the main ideas with the China Biographical Database (CBDB).

* 3 pages, 3 figures, 2017 Annual Meeting of the Japanese Association for Digital Humanities
Click to Read Paper
This paper revisits the problem of analyzing multiple ratings given by different judges. Different from previous work that focuses on distilling the true labels from noisy crowdsourcing ratings, we emphasize gaining diagnostic insights into our in-house well-trained judges. We generalize the well-known DawidSkene model (Dawid & Skene, 1979) to a spectrum of probabilistic models under the same "TrueLabel + Confusion" paradigm, and show that our proposed hierarchical Bayesian model, called HybridConfusion, consistently outperforms DawidSkene on both synthetic and real-world data sets.

* ICML2012
Click to Read Paper
The well-known Mori-Zwanzig theory tells us that model reduction leads to memory effect. For a long time, modeling the memory effect accurately and efficiently has been an important but nearly impossible task in developing a good reduced model. In this work, we explore a natural analogy between recurrent neural networks and the Mori-Zwanzig formalism to establish a systematic approach for developing reduced models with memory. Two training models-a direct training model and a dynamically coupled training model-are proposed and compared. We apply these methods to the Kuramoto-Sivashinsky equation and the Navier-Stokes equation. Numerical experiments show that the proposed method can produce reduced model with good performance on both short-term prediction and long-term statistical properties.

Click to Read Paper
Although there are increasing and significant ties between China and Portuguese-speaking countries, there is not much parallel corpora in the Chinese-Portuguese language pair. Both languages are very populous, with 1.2 billion native Chinese speakers and 279 million native Portuguese speakers, the language pair, however, could be considered as low-resource in terms of available parallel corpora. In this paper, we describe our methods to curate Chinese-Portuguese parallel corpora and evaluate their quality. We extracted bilingual data from Macao government websites and proposed a hierarchical strategy to build a large parallel corpus. Experiments are conducted on existing and our corpora using both Phrased-Based Machine Translation (PBMT) and the state-of-the-art Neural Machine Translation (NMT) models. The results of this work can be used as a benchmark for future Chinese-Portuguese MT systems. The approach we used in this paper also shows a good example on how to boost performance of MT systems for low-resource language pairs.

* accepted by LREC 2018
Click to Read Paper
The naturalness of warps is gaining extensive attentions in image stitching. Recent warps such as SPHP and AANAP, use global similarity warps to mitigate projective distortion (which enlarges regions), however, they necessarily bring in perspective distortion (which generates inconsistencies). In this paper, we propose a novel quasi-homography warp, which effectively balances the perspective distortion against the projective distortion in the non-overlapping region to create a more natural-looking panorama. Our approach formulates the warp as the solution of a bivariate system, where perspective distortion and projective distortion are characterized as slope preservation and scale linearization respectively. Because our proposed warp only relies on a global homography, thus it is totally parameter-free. A comprehensive experiment shows that a quasi-homography warp outperforms some state-of-the-art warps in urban scenes, including homography, AutoStitch and SPHP. A user study demonstrates that it wins most users' favor, comparing to homography and SPHP.

* 10 pages, 9 figures
Click to Read Paper
In this paper, we propose a robust change detection method for intelligent visual surveillance. This method, named M4CD, includes three major steps. Firstly, a sample-based background model that integrates color and texture cues is built and updated over time. Secondly, multiple heterogeneous features (including brightness variation, chromaticity variation, and texture variation) are extracted by comparing the input frame with the background model, and a multi-source learning strategy is designed to online estimate the probability distributions for both foreground and background. The three features are approximately conditionally independent, making multi-source learning feasible. Pixel-wise foreground posteriors are then estimated with Bayes rule. Finally, the Markov random field (MRF) optimization and heuristic post-processing techniques are used sequentially to improve accuracy. In particular, a two-layer MRF model is constructed to represent pixel-based and superpixel-based contextual constraints compactly. Experimental results on the CDnet dataset indicate that M4CD is robust under complex environments and ranks among the top methods.

Click to Read Paper
When we say "I know why he was late", we know not only the fact that he was late, but also an explanation of this fact. We propose a logical framework of "knowing why" inspired by the existing formal studies on why-questions, scientific explanation, and justification logic. We introduce the Ky_i operator into the language of epistemic logic to express "agent i knows why phi" and propose a Kripke-style semantics of such expressions in terms of knowing an explanation of phi. We obtain two sound and complete axiomatizations w.r.t. two different model classes depending on different assumptions about introspection.

* 34 pages, submitted, a new section added
Click to Read Paper
Image stitching is challenging in consumer-level photography, due to alignment difficulties in unconstrained shooting environment. Recent studies show that seam-cutting approaches can effectively relieve artifacts generated by local misalignment. Normally, seam-cutting is described in terms of energy minimization, however, few of existing methods consider human perception in their energy functions, which sometimes causes that a seam with minimum energy is not most invisible in the overlapping region. In this paper, we propose a novel perception-based energy function in the seam-cutting framework, which considers the nonlinearity and the nonuniformity of human perception in energy minimization. Our perception-based approach adopts a sigmoid metric to characterize the perception of color discrimination, and a saliency weight to simulate that human eyes incline to pay more attention to salient objects. In addition, our seam-cutting composition can be easily implemented into other stitching pipelines. Experiments show that our method outperforms the seam-cutting method of the normal energy function, and a user study demonstrates that our composed results are more consistent with human perception.

* 5 pages, 6 figures
Click to Read Paper
With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e.g., image, video and voice processing. However, the neural network model is getting larger and larger, which is expressed in the calculation of model parameters. Although a wealth of existing efforts on GPU platforms currently used by researchers for improving computing performance, dedicated hardware solutions are essential and emerging to provide advantages over pure software solutions. In this paper, we systematically investigate the neural network accelerator based on FPGA. Specifically, we respectively review the accelerators designed for specific problems, specific algorithms, algorithm features, and general templates. We also compared the design and implementation of the accelerator based on FPGA under different devices and network models and compared it with the versions of CPU and GPU. Finally, we present to discuss the advantages and disadvantages of accelerators on FPGA platforms and to further explore the opportunities for future research.

Click to Read Paper
We introduce the concept of continuous transportation task to the context of multi-agent systems. A continuous transportation task is one in which a multi-agent team visits a number of fixed locations, picks up objects, and delivers them to a final destination. The goal is to maximize the rate of transportation while the objects are replenished over time. Examples of problems that need continuous transportation are foraging, area sweeping, and first/last mile problem. Previous approaches typically neglect the interference and are highly dependent on communications among agents. Some also incorporate an additional reconnaissance agent to gather information. In this paper, we present a hybrid of centralized and distributed approaches that minimize the interference and communications in the multi-agent team without the need for a reconnaissance agent. We contribute two partitioning-transportation algorithms inspired by existing algorithms, and contribute one novel online partitioning-transportation algorithm with information gathering in the multi-agent team. Our algorithms have been implemented and tested extensively in the simulation. The results presented in this paper demonstrate the effectiveness of our algorithms that outperform the existing algorithms, even without any communications between the agents and without the presence of a reconnaissance agent.

* 2 pages, published in the proceedings of the 15th AAMAS conference
Click to Read Paper
Learning and generating Chinese poems is a charming yet challenging task. Traditional approaches involve various language modeling and machine translation techniques, however, they perform not as well when generating poems with complex pattern constraints, for example Song iambics, a famous type of poems that involve variable-length sentences and strict rhythmic patterns. This paper applies the attention-based sequence-to-sequence model to generate Chinese Song iambics. Specifically, we encode the cue sentences by a bi-directional Long-Short Term Memory (LSTM) model and then predict the entire iambic with the information provided by the encoder, in the form of an attention-based LSTM that can regularize the generation process by the fine structure of the input cues. Several techniques are investigated to improve the model, including global context integration, hybrid style training, character vector initialization and adaptation. Both the automatic and subjective evaluation results show that our model indeed can learn the complex structural and rhythmic patterns of Song iambics, and the generation is rather successful.

Click to Read Paper
Designing and implementing efficient, provably correct parallel neural network processing is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. However, the diversity and large-scale data size have posed a significant challenge to construct a flexible and high-performance implementation of deep learning neural networks. To improve the performance and maintain the scalability, we present CNNLab, a novel deep learning framework using GPU and FPGA-based accelerators. CNNLab provides a uniform programming model to users so that the hardware implementation and the scheduling are invisible to the programmers. At runtime, CNNLab leverages the trade-offs between GPU and FPGA before offloading the tasks to the accelerators. Experimental results on the state-of-the-art Nvidia K40 GPU and Altera DE5 FPGA board demonstrate that the CNNLab can provide a universal framework with efficient support for diverse applications without increasing the burden of the programmers. Moreover, we analyze the detailed quantitative performance, throughput, power, energy, and performance density for both approaches. Experimental results leverage the trade-offs between GPU and FPGA and provide useful practical experiences for the deep learning research community.

Click to Read Paper
In this paper, we study the ratio of the $L_1 $ and $L_2 $ norms, denoted as $L_1/L_2$, to promote sparsity. Due to the non-convexity and non-linearity, there has been little attention to this scale-invariant metric. Compared to popular models in the literature such as the $L_p$ model for $p\in(0,1)$ and the transformed $L_1$ (TL1), this ratio model is parameter free. Theoretically, we present a weak null space property (wNSP) and prove that any sparse vector is a local minimizer of the $L_1 /L_2 $ model provided with this wNSP condition. Computationally, we focus on a constrained formulation that can be solved via the alternating direction method of multipliers (ADMM). Experiments show that the proposed approach is comparable to the state-of-the-art methods in sparse recovery. In addition, a variant of the $L_1/L_2$ model to apply on the gradient is also discussed with a proof-of-concept example of MRI reconstruction.construction.

* 25 pages
Click to Read Paper
Regularization plays a crucial role in supervised learning. Most existing methods enforce a global regularization in a structure agnostic manner. In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. In particular, our measurement of topological complexity incorporates the importance of topological features (e.g., connected components, handles, and so on) in a meaningful manner, and provides a direct control over spurious topological structures. We incorporate the new measurement as a topological penalty in training classifiers. We also pro- pose an efficient algorithm to compute the gradient of such penalty. Our method pro- vides a novel way to topologically simplify the global structure of the model, without having to sacrifice too much of the flexibility of the model. We demonstrate the effectiveness of our new topological regularizer on a range of synthetic and real-world datasets.

Click to Read Paper