Detecting a change point is a crucial task in statistics that has been recently extended to the quantum realm. A source state generator that emits a series of single photons in a default state suffers an alteration at some point and starts to emit photons in a mutated state. The problem consists in identifying the point where the change took place. In this work, we consider a learning agent that applies Bayesian inference on experimental data to solve this problem. This learning machine adjusts the measurement over each photon according to the past experimental results finds the change position in an online fashion. Our results show that the local-detection success probability can be largely improved by using such a machine learning technique. This protocol provides a tool for improvement in many applications where a sequence of identical quantum states is required.

* Phys. Rev. A 98, 040301 (2018)
Click to Read Paper
In evidence-based medicine (EBM), structured medical questions are always favored for efficient search of the best available evidence for treatments. PICO element detection is widely used to help structurize the clinical studies and question by identifying the sentences in a given medical text that belong to one of the four components: Participants (P), Intervention (I), Comparison (C), and Outcome (O). In this work, we propose a hierarchical deep neural network (DNN) architecture that contains dual bi-directional long short-term memory (bi-LSTM) layers to automatically detect the PICO element in medical texts. Within the model, the lower layer of bi-LSTM is for sentence encoding while the upper one is to contextualize the encoded sentence representation vector. In addition, we adopt adversarial and virtual adversarial training to regularize the model. Overall, we advance the PICO element detection to new state-of-the-art performance, outperforming the previous works by at least 4\% in F1 score for all P/I/O categories.

* Submitted to NIPS ML4H 2018
Click to Read Paper
Prevalent models based on artificial neural network (ANN) for sentence classification often classify sentences in isolation without considering the context in which sentences appear. This hampers the traditional sentence classification approaches to the problem of sequential sentence classification, where structured prediction is needed for better overall classification performance. In this work, we present a hierarchical sequential labeling network to make use of the contextual information within surrounding sentences to help classify the current sentence. Our model outperforms the state-of-the-art results by 2%-3% on two benchmarking datasets for sequential sentence classification in medical scientific abstracts.

* Accepted by EMNLP 2018
Click to Read Paper
Text style transfer seeks to learn how to automatically rewrite sentences from a source domain to the target domain in different styles, while simultaneously preserving their semantic contents. A major challenge in this task stems from the lack of parallel data that connects the source and target styles. Existing approaches try to disentangle content and style, but this is quite difficult and often results in poor content-preservation and grammaticality. In contrast, we propose a novel approach by first constructing a pseudo-parallel resource that aligns a subset of sentences with similar content between source and target corpus. And then a standard sequence-to-sequence model can be applied to learn the style transfer. Subsequently, we iteratively refine the learned style transfer function while improving upon the imperfections in our original alignment. Our method is applied to the tasks of sentiment modification and formality transfer, where it outperforms state-of-the-art systems by a large margin. As an auxiliary contribution, we produced a publicly-available test set with human-generated style transfers for future community use.

Click to Read Paper
A novel tag completion algorithm is proposed in this paper, which is designed with the following features: 1) Low-rank and error s-parsity: the incomplete initial tagging matrix D is decomposed into the complete tagging matrix A and a sparse error matrix E. However, instead of minimizing its nuclear norm, A is further factor-ized into a basis matrix U and a sparse coefficient matrix V, i.e. D=UV+E. This low-rank formulation encapsulating sparse coding enables our algorithm to recover latent structures from noisy initial data and avoid performing too much denoising; 2) Local reconstruction structure consistency: to steer the completion of D, the local linear reconstruction structures in feature space and tag space are obtained and preserved by U and V respectively. Such a scheme could alleviate the negative effect of distances measured by low-level features and incomplete tags. Thus, we can seek a balance between exploiting as much information and not being mislead to suboptimal performance. Experiments conducted on Corel5k dataset and the newly issued Flickr30Concepts dataset demonstrate the effectiveness and efficiency of the proposed method.

Click to Read Paper
DNNs have been quickly and broadly exploited to improve the data analysis quality in many complex science and engineering applications. Today's DNNs are becoming deeper and wider because of increasing demand on the analysis quality and more and more complex applications to resolve. The wide and deep DNNs, however, require large amounts of resources, significantly restricting their utilization on resource-constrained systems. Although some network simplification methods have been proposed to address this issue, they suffer from either low compression ratios or high compression errors, which may introduce a costly retraining process for the target accuracy. In this paper, we propose DeepSZ: an accuracy-loss bounded neural network compression framework, which involves four key steps: network pruning, error bound assessment, optimization for error bound configuration, and compressed model generation, featuring a high compression ratio and low encoding time. The contribution is three-fold. (1) We develop an adaptive approach to select the feasible error bounds for each layer. (2) We build a model to estimate the overall loss of accuracy based on the accuracy degradation caused by individual decompressed layers. (3) We develop an efficient optimization algorithm to determine the best-fit configuration of error bounds in order to maximize the compression ratio under the user-set accuracy constraint. Experiments show that DeepSZ can compress AlexNet and VGG-16 on the ImageNet by a compression ratio of 46X and 116X, respectively, and compress LeNet-300-100 and LeNet-5 on the MNIST by a compression ratio of 57X and 56X, respectively, with only up to 0.3% loss of accuracy. Compared with other state-of-the-art methods, DeepSZ can improve the compression ratio by up to 1.43X, the DNN encoding performance by up to 4.0X (with four Nvidia Tesla V100 GPUs), and the decoding performance by up to 6.2X.

* 12 pages, 6 figures, submitted to HPDC'19
Click to Read Paper
We combine generative adversarial network (GAN) with light microscopy to achieve deep learning super-resolution under a large field of view (FOV). By appropriately adopting prior microscopy data in an adversarial training, the neural network can recover a high-resolution, accurate image of new specimen from its single low-resolution measurement. Its capacity has been broadly demonstrated via imaging various types of samples, such as USAF resolution target, human pathological slides, fluorescence-labelled fibroblast cells, and deep tissues in transgenic mouse brain, by both wide-field and light-sheet microscopes. The gigapixel, multi-color reconstruction of these samples verifies a successful GAN-based single image super-resolution procedure. We also propose an image degrading model to generate low resolution images for training, making our approach free from the complex image registration during training dataset preparation. After a welltrained network being created, this deep learning-based imaging approach is capable of recovering a large FOV (~95 mm2), high-resolution (~1.7 {\mu}m) image at high speed (within 1 second), while not necessarily introducing any changes to the setup of existing microscopes.

* 21 pages, 9 figures and 1 table. Peng Fe and Di Jin conceived the ides, initiated the investigation. Hao Zhang, Di Jin and Peng Fei prepared the manuscript
Click to Read Paper
A well-trained deep neural network is shown to gain capability of simultaneously restoring two kinds of images, which are completely destroyed by two distinct scattering medias respectively. The network, based on the U-net architecture, can be trained by blended dataset of speckles-reference images pairs. We experimentally demonstrate the power of the network in reconstructing images which are strongly diffused by glass diffuser or multi-mode fiber. The learning model further shows good generalization ability to reconstruct images that are distinguished from the training dataset. Our work facilitates the study of optical transmission and expands machine learning's application in optics.

* 8 pages, 6 figures
Click to Read Paper
Input utterance with short duration is one of the most critical threats that degrade the performance of speaker verification systems. This study aimed to develop an integrated text-independent speaker verification system that inputs utterances with short durations of 2.05 seconds. For this goal, we propose an approach using a teacher-student learning framework that maximizes the cosine similarity of two speaker embeddings extracted from long and short utterances. In the proposed architecture, phonetic-level features in which each feature represents a segment of 130 ms are extracted using convolutional layers. The gated recurrent units extract an utterance-level speaker embedding using the phonetic-level features. Experiments were conducted using deep neural networks that take raw waveforms as input, and output speaker embeddings on the VoxCeleb 1 dataset. The equal error rates without short utterance compensation are 8.72 % and 12.8 %, for evaluation sets with durations of 3.59 s and 2.05 s, respectively. The proposed model with compensation exhibits an equal error rate of 10.08 % for 2.05 s utterances, which compensates more than 65 % of the performance degradation.

* 5 pages, 2 figures, submitted to ICASSP 2019 as a conference paper
Click to Read Paper
Background: Three-dimensional (3D) cephalometric analysis using computerized tomography data has been rapidly adopted for dysmorphosis and anthropometry. Several different approaches to automatic 3D annotation have been proposed to overcome the limitations of traditional cephalometry. The purpose of this study was to evaluate the accuracy of our newly-developed system using a deep learning algorithm for automatic 3D cephalometric annotation. Methods: To overcome current technical limitations, some measures were developed to directly annotate 3D human skull data. Our deep learning-based model system mainly consisted of a 3D convolutional neural network and image data resampling. Results: The discrepancies between the referenced and predicted coordinate values in three axes and in 3D distance were calculated to evaluate system accuracy. Our new model system yielded prediction errors of 3.26, 3.18, and 4.81 mm (for three axes) and 7.61 mm (for 3D). Moreover, there was no difference among the landmarks of the three groups, including the midsagittal plane, horizontal plane, and mandible (p>0.05). Conclusion: A new 3D convolutional neural network-based automatic annotation system for 3D cephalometry was developed. The strategies used to implement the system were detailed and measurement results were evaluated for accuracy. Further development of this system is planned for full clinical application of automatic 3D cephalometric annotation.

Click to Read Paper
In this paper, we propose a replay attack spoofing detection system for automatic speaker verification using multitask learning of noise classes. We define the noise that is caused by the replay attack as replay noise. We explore the effectiveness of training a deep neural network simultaneously for replay attack spoofing detection and replay noise classification. The multi-task learning includes classifying the noise of playback devices, recording environments, and recording devices as well as the spoofing detection. Each of the three types of the noise classes also includes a genuine class. The experiment results on the ASVspoof2017 datasets demonstrate that the performance of our proposed system is improved by 30% relatively on the evaluation set.

* 5 pages, accepted by Technologies and Applications of Artificial Intelligence(TAAI)
Click to Read Paper
In this paper, we propose a new fuzzy reasoning principle, so called Movement and Transformation Principle(MTP). This Principle is to obtain a new fuzzy reasoning result by Movement and Transformation the consequent fuzzy set in response to the Movement, Transformation, and Movement-Transformation operations between the antecedent fuzzy set and fuzzificated observation information. And then we presented fuzzy modus ponens and fuzzy modus tollens based on MTP. We compare proposed method with Mamdani fuzzy system, Sugeno fuzzy system, Wang distance type fuzzy reasoning method and Hellendoorn functional type method. And then we applied to the learning experiments of the fuzzy neural network based on MTP and compared it with the Sugeno method. Through prediction experiments of fuzzy neural network on the precipitation data and security situation data, learning accuracy and time performance are clearly improved. Consequently we show that our method based on MTP is computationally simple and does not involve nonlinear operations, so it is easy to handle mathematically.

Click to Read Paper
In recent years, speaker verification has been primarily performed using deep neural networks that are trained to output embeddings from input features such as spectrograms or filterbank energies. Therefore, studies have been conducted to design various loss functions, including metric learning, to train deep neural networks to make them suitable for speaker verification. We propose end-to-end loss functions for speaker verification using speaker bases, which are trainable parameters. We expect that each speaker basis will represent the corresponding speaker in the process of training deep neural networks. Conventional loss functions can only consider a limited number of speakers that are included in a mini-batch. In contrast, as the proposed loss functions are based on speaker bases, each sample can be compared against all speakers regardless of mini-batch composition. Through a speaker verification experiment performed using the VoxCeleb 1, we confirmed that the proposed loss functions could increase between-speaker variations and perform hard negative mining for each mini-batch. In particular, it was shown that the system trained through the proposed loss functions had an equal error rate of 5.55%. In addition, the proposed loss functions reduced errors by approximately 15% compared with the system trained with the conventional center loss function.

* 5 pages and 2 figures
Click to Read Paper
We apply both distance-based (Jin and Matteson, 2017) and kernel-based (Pfister et al., 2016) mutual dependence measures to independent component analysis (ICA), and generalize dCovICA (Matteson and Tsay, 2017) to MDMICA, minimizing empirical dependence measures as an objective function in both deflation and parallel manners. Solving this minimization problem, we introduce Latin hypercube sampling (LHS) (McKay et al., 2000), and a global optimization method, Bayesian optimization (BO) (Mockus, 1994) to improve the initialization of the Newton-type local optimization method. The performance of MDMICA is evaluated in various simulation studies and an image data example. When the ICA model is correct, MDMICA achieves competitive results compared to existing approaches. When the ICA model is misspecified, the estimated independent components are less mutually dependent than the observed components using MDMICA, while they are prone to be even more mutually dependent than the observed components using other approaches.

* 11 pages, 4 figures
Click to Read Paper
In recent years, hashing methods have been proved efficient for large-scale Web media search. However, existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have subtle difference. To solve this problem, we for the first time introduce attention mechanism to the learning of hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. The saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on Oxford Flowers-17 and Stanford Dogs-120 demonstrate that our DSaH performs the best for fine-grained retrieval task and beats the existing best retrieval performance (DPSH) by approximately 12%. DSaH also outperforms several state-of-the-art hashing methods on general datasets, including CIFAR-10 and NUS-WIDE.

Click to Read Paper
A probabilistic query may not be estimable from observed data corrupted by missing values if the data are not missing at random (MAR). It is therefore of theoretical interest and practical importance to determine in principle whether a probabilistic query is estimable from missing data or not when the data are not MAR. We present an algorithm that systematically determines whether the joint probability is estimable from observed data with missing values, assuming that the data-generation model is represented as a Bayesian network containing unobserved latent variables that not only encodes the dependencies among the variables but also explicitly portrays the mechanisms responsible for the missingness process. The result significantly advances the existing work.

Click to Read Paper
This paper concerns the assessment of the effects of actions from a combination of nonexperimental data and causal assumptions encoded in the form of a directed acyclic graph in which some variables are presumed to be unobserved. We provide a procedure that systematically identifies cause effects between two sets of variables conditioned on some other variables, in time polynomial in the number of variables in the graph. The identifiable conditional causal effects are expressed in terms of the observed joint distribution.

* Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)
Click to Read Paper
Maximal ancestral graphs (MAGs) are used to encode conditional independence relations in DAG models with hidden variables. Different MAGs may represent the same set of conditional independences and are called Markov equivalent. This paper considers MAGs without undirected edges and shows conditions under which an arrow in a MAG can be reversed or interchanged with a bi-directed edge so as to yield a Markov equivalent MAG.

* Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)
Click to Read Paper
This paper deals with the problem of identifying direct causal effects in recursive linear structural equation models. The paper establishes a sufficient criterion for identifying individual causal effects and provides a procedure computing identified causal effects in terms of observed covariance matrix.

* Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)
Click to Read Paper
Combining models in appropriate ways to achieve high performance is commonly seen in machine learning fields today. Although a large amount of combinatorial models have been created, little attention is drawn to the commons in different models and their connections. A general modelling technique is thus worth studying to understand model combination deeply and shed light on creating new models. Prediction markets show a promise of becoming such a generic, flexible combinatorial model. By reviewing on several popular combinatorial models and prediction market models, this paper aims to show how the market models can generalise different combinatorial stuctures and how they implement these popular combinatorial models in specific conditions. Besides, we will see among different market models, Storkey's \emph{Machine Learning Markets} provide more fundamental, generic modelling mechanisms than the others, and it has a significant appeal for both theoretical study and application.

Click to Read Paper