With the rapid development of Deep Neural Networks (DNNs), various network models that show strong computing power and impressive expressive power are proposed. However, there is no comprehensive informational interpretation of DNNs from the perspective of information theory. Due to the nonlinear function and the uncertain number of layers and neural units used in the DNNs, the network structure shows nonlinearity and complexity. With the typical DNNs named Convolutional Arithmetic Circuits (ConvACs), the complex DNNs can be converted into mathematical formula. Thus, we can use rigorous mathematical theory especially the information theory to analyse the complicated DNNs. In this paper, we propose a novel information scaling law scheme that can interpret the network's inner organization by information theory. First, we show the informational interpretation of the activation function. Secondly, we prove that the information entropy increases when the information is transmitted through the ConvACs. Finally, we propose the information scaling law of ConvACs through making a reasonable assumption.

* 7 pages, 5 figures
Click to Read Paper
We propose PANDA, an AdaPtive Noise Augmentation technique to regularize estimating and constructing undirected graphical models (UGMs). PANDA iteratively solves MLEs given noise augmented data in the regression-based framework until convergence to achieve the designed regularization effects. The augmented noises can be designed to achieve various regularization effects on graph estimation, including the bridge, elastic net, adaptive lasso, and SCAD penalization; it can also offer group lasso and fused ridge when some nodes belong to the same group. We establish theoretically that the noise-augmented loss functions and its minimizer converge almost surely to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized regression coefficients through PANDA in GLMs, based on which, the inferences for the parameters can be obtained simultaneously with variable selection. Our empirical results suggest the inferences achieve nominal or near-nominal coverage and are far more efficient compared to some existing post-selection procedures. On the algorithm level, PANDA can be easily programmed in any standard software without resorting to complicated optimization techniques. We show the non-inferior performance of PANDA in constructing graphs of different types in simulation studies and also apply PANDA to the autism spectrum disorder data to construct a mixed-node graph.

Click to Read Paper
In many applications of deep learning, particularly those in image restoration, it is either very difficult, prohibitively expensive, or outright impossible to obtain paired training data precisely as in the real world. In such cases, one is forced to use synthesized paired data to train the deep convolutional neural network (DCNN). However, due to the unavoidable generalization error in statistical learning, the synthetically trained DCNN often performs poorly on real world data. To overcome this problem, we propose a new general training method that can compensate for, to a large extent, the generalization errors of synthetically trained DCNNs.

Click to Read Paper
Event extraction is of practical utility in natural language processing. In the real world, it is a common phenomenon that multiple events existing in the same sentence, where extracting them are more difficult than extracting a single event. Previous works on modeling the associations between events by sequential modeling methods suffer a lot from the low efficiency in capturing very long-range dependencies. In this paper, we propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information. The experiment results demonstrate that our proposed framework achieves competitive results compared with state-of-the-art methods.

* EMNLP. 1 (2018) 1247-1256
* accepted by EMNLP 2018
Click to Read Paper
Script event prediction requires a model to predict the subsequent event given an existing event context. Previous models based on event pairs or event chains cannot make full use of dense event connections, which may limit their capability of event prediction. To remedy this, we propose constructing an event graph to better utilize the event network information for script event prediction. In particular, we first extract narrative event chains from large quantities of news corpus, and then construct a narrative event evolutionary graph (NEEG) based on the extracted chains. NEEG can be seen as a knowledge base that describes event evolutionary principles and patterns. To solve the inference problem on NEEG, we present a scaled graph neural network (SGNN) to model event interactions and learn better event representations. Instead of computing the representations on the whole graph, SGNN processes only the concerned nodes each time, which makes our model feasible to large-scale graphs. By comparing the similarity between input context event representations and candidate event representations, we can choose the most reasonable subsequent event. Experimental results on widely used New York Times corpus demonstrate that our model significantly outperforms state-of-the-art baseline methods, by using standard multiple choice narrative cloze evaluation.

* This paper has been accepted by IJCAI 2018
Click to Read Paper
Taking photos of optoelectronic displays is a direct and spontaneous way of transferring data and keeping records, which is widely practiced. However, due to the analog signal interference between the pixel grids of the display screen and camera sensor array, objectionable moir\'e (alias) patterns appear in captured screen images. As the moir\'e patterns are structured and highly variant, they are difficult to be completely removed without affecting the underneath latent image. In this paper, we propose an approach of deep convolutional neural network for demoir\'eing screen photos. The proposed DCNN consists of a coarse-scale network and a fine-scale network. In the coarse-scale network, the input image is first downsampled and then processed by stacked residual blocks to remove the moir\'e artifacts. After that, the fine-scale network upsamples the demoir\'ed low-resolution image back to the original resolution. Extensive experimental results have demonstrated that the proposed technique can efficiently remove the moir\'e patterns for camera acquired screen images; the new technique outperforms the existing ones.

Click to Read Paper
All existing image enhancement methods, such as HDR tone mapping, cannot recover A/D quantization losses due to insufficient or excessive lighting, (underflow and overflow problems). The loss of image details due to A/D quantization is complete and it cannot be recovered by traditional image processing methods, but the modern data-driven machine learning approach offers a much needed cure to the problem. In this work we propose a novel approach to restore and enhance images acquired in low and uneven lighting. First, the ill illumination is algorithmically compensated by emulating the effects of artificial supplementary lighting. Then a DCNN trained using only synthetic data recovers the missing detail caused by quantization.

Click to Read Paper
This paper presents a generic pre-processor for expediting conventional template matching techniques. Instead of locating the best matched patch in the reference image to a query template via exhaustive search, the proposed algorithm rules out regions with no possible matches with minimum computational efforts. While working on simple patch features, such as mean, variance and gradient, the fast pre-screening is highly discriminative. Its computational efficiency is gained by using a novel octagonal-star-shaped template and the inclusion-exclusion principle to extract and compare patch features. Moreover, it can handle arbitrary rotation and scaling of reference images effectively. Extensive experiments demonstrate that the proposed algorithm greatly reduces the search space while never missing the best match.

Click to Read Paper
Point sets matching method is very important in computer vision, feature extraction, fingerprint matching, motion estimation and so on. This paper proposes a robust point sets matching method. We present an iterative algorithm that is robust to noise case. Firstly, we calculate all transformations between two points. Then similarity matrix are computed to measure the possibility that two transformation are both true. We iteratively update the matching score matrix by using the similarity matrix. By using matching algorithm on graph, we obtain the matching result. Experimental results obtained by our approach show robustness to outlier and jitter.

* 9 pages, 3 figures, 4 tables
Click to Read Paper
Symmetric nonnegative matrix factorization (NMF), a special but important class of the general NMF, is demonstrated to be useful for data analysis and in particular for various clustering tasks. Unfortunately, designing fast algorithms for Symmetric NMF is not as easy as for the nonsymmetric counterpart, the latter admitting the splitting property that allows efficient alternating-type algorithms. To overcome this issue, we transfer the symmetric NMF to a nonsymmetric one, then we can adopt the idea from the state-of-the-art algorithms for nonsymmetric NMF to design fast algorithms solving symmetric NMF. We rigorously establish that solving nonsymmetric reformulation returns a solution for symmetric NMF and then apply fast alternating based algorithms for the corresponding reformulated problem. Furthermore, we show these fast algorithms admit strong convergence guarantee in the sense that the generated sequence is convergent at least at a sublinear rate and it converges globally to a critical point of the symmetric NMF. We conduct experiments on both synthetic data and image clustering to support our result.

* Accepted in NIPS 2018
Click to Read Paper
In this work, we mainly study the mechanism of learning the steganographic algorithm as well as combining the learning process with adversarial learning to learn a good steganographic algorithm. To handle the problem of embedding secret messages into the specific medium, we design a novel adversarial modules to learn the steganographic algorithm, and simultaneously train three modules called generator, discriminator and steganalyzer. Different from existing methods, the three modules are formalized as a game to communicate with each other. In the game, the generator and discriminator attempt to communicate with each other using secret messages hidden in an image. While the steganalyzer attempts to analyze whether there is a transmission of confidential information. We show that through unsupervised adversarial training, the adversarial model can produce robust steganographic solutions, which act like an encryption. Furthermore, we propose to utilize supervised adversarial training method to train a robust steganalyzer, which is utilized to discriminate whether an image contains secret information. Numerous experiments are conducted on publicly available dataset to demonstrate the effectiveness of the proposed method.

* arXiv admin note: text overlap with arXiv:1703.00371 by other authors
Click to Read Paper
As efficient traffic-management platforms, public vehicle (PV) systems are envisioned to be a promising approach to solving traffic congestions and pollutions for future smart cities. PV systems provide online/dynamic peer-to-peer ride-sharing services with the goal of serving sufficient number of customers with minimum number of vehicles and lowest possible cost. A key component of the PV system is the online ride-sharing scheduling strategy. In this paper, we propose an efficient path planning strategy that focuses on a limited potential search area for each vehicle by filtering out the requests that violate passenger service quality level, so that the global search is reduced to local search. We analyze the performance of the proposed solution such as reduction ratio of computational complexity. Simulations based on the Manhattan taxi data set show that, the computing time is reduced by 22% compared with the exhaustive search method under the same service quality performance.

* 12 pages
Click to Read Paper
Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface. Although great progress has been made recently, fast and robust hand gesture recognition remains an open problem, since the existing methods have not well balanced the performance and the efficiency simultaneously. To bridge it, this work combines image entropy and density clustering to exploit the key frames from hand gesture video for further feature extraction, which can improve the efficiency of recognition. Moreover, a feature fusion strategy is also proposed to further improve feature representation, which elevates the performance of recognition. To validate our approach in a "wild" environment, we also introduce two new datasets called HandGesture and Action3D datasets. Experiments consistently demonstrate that our strategy achieves competitive results on Northwestern University, Cambridge, HandGesture and Action3D hand gesture datasets. Our code and datasets will release at https://github.com/Ha0Tang/HandGestureRecognition.

* 11 pages, 3 figures, accepted to NeuroComputing
Click to Read Paper
In recent years, multi-agent epistemic planning has received attention from both dynamic logic and planning communities. Existing implementations of multi-agent epistemic planning are based on compilation into classical planning and suffer from various limitations, such as generating only linear plans, restriction to public actions, and incapability to handle disjunctive beliefs. In this paper, we propose a general representation language for multi-agent epistemic planning where the initial KB and the goal, the preconditions and effects of actions can be arbitrary multi-agent epistemic formulas, and the solution is an action tree branching on sensing results. To support efficient reasoning in the multi-agent KD45 logic, we make use of a normal form called alternating cover disjunctive formulas (ACDFs). We propose basic revision and update algorithms for ACDFs. We also handle static propositional common knowledge, which we call constraints. Based on our reasoning, revision and update algorithms, adapting the PrAO algorithm for contingent planning from the literature, we implemented a multi-agent epistemic planner called MEPK. Our experimental results show the viability of our approach.

* IJCAI. (2017) 1093-1101
* One of the authors think it's not appropriate to show this work there days. Then we discussed, we want submit a new work and this one together later
Click to Read Paper
A cooperative group optimization (CGO) system is presented to implement CGO cases by integrating the advantages of the cooperative group and low-level algorithm portfolio design. Following the nature-inspired paradigm of a cooperative group, the agents not only explore in a parallel way with their individual memory, but also cooperate with their peers through the group memory. Each agent holds a portfolio of (heterogeneous) embedded search heuristics (ESHs), in which each ESH can drive the group into a stand-alone CGO case, and hybrid CGO cases in an algorithmic space can be defined by low-level cooperative search among a portfolio of ESHs through customized memory sharing. The optimization process might also be facilitated by a passive group leader through encoding knowledge in the search landscape. Based on a concrete framework, CGO cases are defined by a script assembling over instances of algorithmic components in a toolbox. A multilayer design of the script, with the support of the inherent updatable graph in the memory protocol, enables a simple way to address the challenge of accumulating heterogeneous ESHs and defining customized portfolios without any additional code. The CGO system is implemented for solving the constrained optimization problem with some generic components and only a few domain-specific components. Guided by the insights from algorithm portfolio design, customized CGO cases based on basic search operators can achieve competitive performance over existing algorithms as compared on a set of commonly-used benchmark instances. This work might provide a basic step toward a user-oriented development framework, since the algorithmic space might be easily evolved by accumulating competent ESHs.

* Soft Computing 18(3): 469-495, 2014
Click to Read Paper
YouTube presents an unprecedented opportunity to explore how machine learning methods can improve healthcare information dissemination. We propose an interdisciplinary lens that synthesizes machine learning methods with healthcare informatics themes to address the critical issue of developing a scalable algorithmic solution to evaluate videos from a health literacy and patient education perspective. We develop a deep learning method to understand the level of medical knowledge encoded in YouTube videos. Preliminary results suggest that we can extract medical knowledge from YouTube videos and classify videos according to the embedded knowledge with satisfying performance. Deep learning methods show great promise in knowledge extraction, natural language understanding, and image classification, especially in an era of patient-centric care and precision medicine.

Click to Read Paper
In this paper, we aim at solving the multi-domain image-to-image translation problem by a single GAN-based model in an unsupervised manner. In the field of image-to-image translation, most previous works mainly focus on adopting a generative adversarial network, which contains three parts, i.e., encoder, decoder and discriminator. These three parts are trained to give the encoder and the decoder together as a translator. However, the discriminator that occupies a lot of parameters is abandoned after the training process, which is wasteful of computation and memory. To handle this problem, we integrate the discriminator and the encoder of the traditional framework into a single network, where the decoder in our framework translates the information encoded by the discriminator to the target image. As a result, our framework only contains two parts, i.e., decoder and discriminator, which effectively reduces the number of the parameters of the network and achieves more effective training. Then, we expand the traditional binary-class discriminator to the multi-classes discriminator, which solves the multi-domain image-to-image translation problem in traditional settings. At last, we propose the label encoder to transform the label vector to high-dimension representation automatically rather than designing a one-hot vector manually. We performed extensive experiments on many image-to-image translation tasks including style transfer, season transfer, face hallucination, etc. A unified model was trained to translate images sampled from 14 considerable different domains and the comparisons to several recently-proposed approaches demonstrate the superiority and novelty of our framework.

* 12 pages, 7 figures
Click to Read Paper
We describe a parallel bayesian online deep learning framework (PBODL) for click-through rate (CTR) prediction within today's Tencent advertising system, which provides quick and accurate learning of user preferences. We first explain the framework with a deep probit regression model, which is trained with probabilistic back-propagation in the mode of assumed Gaussian density filtering. Then we extend the model family to a variety of bayesian online models with increasing feature embedding capabilities, such as Sparse-MLP, FM-MLP and FFM-MLP. Finally, we implement a parallel training system based on a stream computing infrastructure and parameter servers. Experiments with public available datasets and Tencent industrial datasets show that models within our framework perform better than several common online models, such as AdPredictor, FTRL-Proximal and MatchBox. Online A/B test within Tencent advertising system further proves that our framework could achieve CTR and CPM lift by learning more quickly and accurately.

* 9 pages
Click to Read Paper
In the 'Big Data' era, many real-world applications like search involve the ranking problem for a large number of items. It is important to obtain effective ranking results and at the same time obtain the results efficiently in a timely manner for providing good user experience and saving computational costs. Valuable prior research has been conducted for learning to efficiently rank like the cascade ranking (learning) model, which uses a sequence of ranking functions to progressively filter some items and rank the remaining items. However, most existing research of learning to efficiently rank in search is studied in a relatively small computing environments with simulated user queries. This paper presents novel research and thorough study of designing and deploying a Cascade model in a Large-scale Operational E-commerce Search application (CLOES), which deals with hundreds of millions of user queries per day with hundreds of servers. The challenge of the real-world application provides new insights for research: 1). Real-world search applications often involve multiple factors of preferences or constraints with respect to user experience and computational costs such as search accuracy, search latency, size of search results and total CPU cost, while most existing search solutions only address one or two factors; 2). Effectiveness of e-commerce search involves multiple types of user behaviors such as click and purchase, while most existing cascade ranking in search only models the click behavior. Based on these observations, a novel cascade ranking model is designed and deployed in an operational e-commerce search application. An extensive set of experiments demonstrate the advantage of the proposed work to address multiple factors of effectiveness, efficiency and user experience in the real-world application.

Click to Read Paper
In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that noisy information about the gradients of the objective function is available via a stochastic first-order oracle (SFO). We propose a general framework for such methods, for which we prove almost sure convergence to stationary points and analyze its worst-case iteration complexity. When a randomly chosen iterate is returned as the output of such an algorithm, we prove that in the worst-case, the SFO-calls complexity is $O(\epsilon^{-2})$ to ensure that the expectation of the squared norm of the gradient is smaller than the given accuracy tolerance $\epsilon$. We also propose a specific algorithm, namely a stochastic damped L-BFGS (SdLBFGS) method, that falls under the proposed framework. {Moreover, we incorporate the SVRG variance reduction technique into the proposed SdLBFGS method, and analyze its SFO-calls complexity. Numerical results on a nonconvex binary classification problem using SVM, and a multiclass classification problem using neural networks are reported.

* published in SIAM Journal on Optimization
Click to Read Paper