Models, code, and papers for "Kien A":

Theory and Evaluation Metrics for Learning Disentangled Representations

Aug 26, 2019
Kien Do, Truyen Tran

We make two theoretical contributions to disentanglement learning by (a) defining precise semantics of disentangled representations, and (b) establishing robust metrics for evaluation. First, we characterize the concept "disentangled representations" used in supervised and unsupervised methods along three dimensions-informativeness, separability and interpretability - which can be expressed and quantified explicitly using information-theoretic constructs. This helps explain the behaviors of several well-known disentanglement learning models. We then propose robust metrics for measuring informativeness, separability and interpretability. Through a comprehensive suite of experiments, we show that our metrics correctly characterize the representations learned by different methods and are consistent with qualitative (visual) results. Thus, the metrics allow disentanglement learning methods to be compared on a fair ground. We also empirically uncovered new interesting properties of VAE-based methods and interpreted them with our formulation. These findings are promising and hopefully will encourage the design of more theoretically driven models for learning disentangled representations.


  Click for Model/Code and Paper
Attention Based Neural Architecture for Rumor Detection with Author Context Awareness

Sep 19, 2019
Sansiri Tarnpradab, Kien A. Hua

The prevalence of social media has made information sharing possible across the globe. The downside, unfortunately, is the wide spread of misinformation. Methods applied in most previous rumor classifiers give an equal weight, or attention, to words in the microblog, and do not take the context beyond microblog contents into account; therefore, the accuracy becomes plateaued. In this research, we propose an ensemble neural architecture to detect rumor on Twitter. The architecture incorporates word attention and context from the author to enhance the classification performance. In particular, the word-level attention mechanism enables the architecture to put more emphasis on important words when constructing the text representation. To derive further context, microblog posts composed by individual authors are exploited since they can reflect style and characteristics in spreading information, which are significant cues to help classify whether the shared content is rumor or legitimate news. The experiment on the real-world Twitter dataset collected from two well-known rumor tracking websites demonstrates promising results.


  Click for Model/Code and Paper
Style-aware Neural Model with Application in Authorship Attribution

Sep 12, 2019
Fereshteh Jafariakinabad, Kien A. Hua

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

* arXiv admin note: text overlap with arXiv:1902.09723 

  Click for Model/Code and Paper
Graph Transformation Policy Network for Chemical Reaction Prediction

Dec 22, 2018
Kien Do, Truyen Tran, Svetha Venkatesh

We address a fundamental problem in chemistry known as chemical reaction product prediction. Our main insight is that the input reactant and reagent molecules can be jointly represented as a graph, and the process of generating product molecules from reactant molecules can be formulated as a sequence of graph transformations. To this end, we propose Graph Transformation Policy Network (GTPN) -- a novel generic method that combines the strengths of graph neural networks and reinforcement learning to learn the reactions directly from data with minimal chemical knowledge. Compared to previous methods, GTPN has some appealing properties such as: end-to-end learning, and making no assumption about the length or the order of graph transformations. In order to guide model search through the complex discrete space of sets of bond changes effectively, we extend the standard policy gradient loss by adding useful constraints. Evaluation results show that GTPN improves the top-1 accuracy over the current state-of-the-art method by about 3% on the large USPTO dataset. Our model's performances and prediction errors are also analyzed carefully in the paper.


  Click for Model/Code and Paper
Multi-task Learning of Hierarchical Vision-Language Representation

Dec 03, 2018
Duy-Kien Nguyen, Takayuki Okatani

It is still challenging to build an AI system that can perform tasks that involve vision and language at human level. So far, researchers have singled out individual tasks separately, for each of which they have designed networks and trained them on its dedicated datasets. Although this approach has seen a certain degree of success, it comes with difficulties of understanding relations among different tasks and transferring the knowledge learned for a task to others. We propose a multi-task learning approach that enables to learn vision-language representation that is shared by many tasks from their diverse datasets. The representation is hierarchical, and prediction for each task is computed from the representation at its corresponding level of the hierarchy. We show through experiments that our method consistently outperforms previous single-task-learning methods on image caption retrieval, visual question answering, and visual grounding. We also analyze the learned hierarchical representation by visualizing attention maps generated in our network.


  Click for Model/Code and Paper
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

Apr 03, 2018
Duy-Kien Nguyen, Takayuki Okatani

A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an attention mechanism that enables dense, bi-directional interactions between the two modalities contributes to boost accuracy of prediction of answers. Specifically, we present a simple architecture that is fully symmetric between visual and language representations, in which each question word attends on image regions and each image region attends on question words. It can be stacked to form a hierarchy for multi-step interactions between an image-question pair. We show through experiments that the proposed architecture achieves a new state-of-the-art on VQA and VQA 2.0 despite its small size. We also present qualitative evaluation, demonstrating how the proposed attention mechanism can generate reasonable attention maps on images and questions, which leads to the correct answer prediction.


  Click for Model/Code and Paper
Learning Deep Matrix Representations

Feb 05, 2018
Kien Do, Truyen Tran, Svetha Venkatesh

We present a new distributed representation in deep neural nets wherein the information is represented in native form as a matrix. This differs from current neural architectures that rely on vector representations. We consider matrices as central to the architecture and they compose the input, hidden and output layers. The model representation is more compact and elegant -- the number of parameters grows only with the largest dimension of the incoming layer rather than the number of hidden units. We derive several new deep networks: (i) feed-forward nets that map an input matrix into an output matrix, (ii) recurrent nets which map a sequence of input matrices into a sequence of output matrices. We also reinterpret existing models for (iii) memory-augmented networks and (iv) graphs using matrix notations. For graphs we demonstrate how the new notations lead to simple but effective extensions with multiple attentions. Extensive experiments on handwritten digits recognition, face reconstruction, sequence to sequence learning, EEG classification, and graph-based node classification demonstrate the efficacy and compactness of the matrix architectures.


  Click for Model/Code and Paper
Knowledge Graph Embedding with Multiple Relation Projections

Jan 26, 2018
Kien Do, Truyen Tran, Svetha Venkatesh

Knowledge graphs contain rich relational structures of the world, and thus complement data-driven machine learning in heterogeneous data. One of the most effective methods in representing knowledge graphs is to embed symbolic relations and entities into continuous spaces, where relations are approximately linear translation between projected images of entities in the relation space. However, state-of-the-art relation projection methods such as TransR, TransD or TransSparse do not model the correlation between relations, and thus are not scalable to complex knowledge graphs with thousands of relations, both in computational demand and in statistical robustness. To this end we introduce TransF, a novel translation-based method which mitigates the burden of relation projection by explicitly modeling the basis subspaces of projection matrices. As a result, TransF is far more light weight than the existing projection methods, and is robust when facing a high number of relations. Experimental results on the canonical link prediction task show that our proposed model outperforms competing rivals by a large margin and achieves state-of-the-art performance. Especially, TransF improves by 9%/5% in the head/tail entity prediction task for N-to-1/1-to-N relations over the best performing translation-based method.

* 6 pages 

  Click for Model/Code and Paper
Multilevel Anomaly Detection for Mixed Data

Oct 20, 2016
Kien Do, Truyen Tran, Svetha Venkatesh

Anomalies are those deviating from the norm. Unsupervised anomaly detection often translates to identifying low density regions. Major problems arise when data is high-dimensional and mixed of discrete and continuous attributes. We propose MIXMAD, which stands for MIXed data Multilevel Anomaly Detection, an ensemble method that estimates the sparse regions across multiple levels of abstraction of mixed data. The hypothesis is for domains where multiple data abstractions exist, a data point may be anomalous with respect to the raw representation or more abstract representations. To this end, our method sequentially constructs an ensemble of Deep Belief Nets (DBNs) with varying depths. Each DBN is an energy-based detector at a predefined abstraction level. At the bottom level of each DBN, there is a Mixed-variate Restricted Boltzmann Machine that models the density of mixed data. Predictions across the ensemble are finally combined via rank aggregation. The proposed MIXMAD is evaluated on high-dimensional realworld datasets of different characteristics. The results demonstrate that for anomaly detection, (a) multilevel abstraction of high-dimensional and mixed data is a sensible strategy, and (b) empirically, MIXMAD is superior to popular unsupervised detection methods for both homogeneous and mixed data.

* 9 pages 

  Click for Model/Code and Paper
Constrained Design of Deep Iris Networks

May 23, 2019
Kien Nguyen, Clinton Fookes, Sridha Sridharan

Despite the promise of recent deep neural networks in the iris recognition setting, there are vital properties of the classic IrisCode which are almost unable to be achieved with current deep iris networks: the compactness of model and the small number of computing operations (FLOPs). This paper re-models the iris network design process as a constrained optimization problem which takes model size and computation into account as learning criteria. On one hand, this allows us to fully automate the network design process to search for the best iris network confined to the computation and model compactness constraints. On the other hand, it allows us to investigate the optimality of the classic IrisCode and recent iris networks. It also allows us to learn an optimal iris network and demonstrate state-of-the-art performance with less computation and memory requirements.


  Click for Model/Code and Paper
Syntactic Recurrent Neural Network for Authorship Attribution

Feb 27, 2019
Fereshteh Jafariakinabad, Sansiri Tarnpradab, Kien A. Hua

Writing style is a combination of consistent decisions at different levels of language production including lexical, syntactic, and structural associated to a specific author (or author groups). While lexical-based models have been widely explored in style-based text classification, relying on content makes the model less scalable when dealing with heterogeneous data comprised of various topics. On the other hand, syntactic models which are content-independent, are more robust against topic variance. In this paper, we introduce a syntactic recurrent neural network to encode the syntactic patterns of a document in a hierarchical structure. The model first learns the syntactic representation of sentences from the sequence of part-of-speech tags. For this purpose, we exploit both convolutional filters and long short-term memories to investigate the short-term and long-term dependencies of part-of-speech tags in the sentences. Subsequently, the syntactic representations of sentences are aggregated into document representation using recurrent neural networks. Our experimental results on PAN 2012 dataset for authorship attribution task shows that syntactic recurrent neural network outperforms the lexical model with the identical architecture by approximately 14% in terms of accuracy.


  Click for Model/Code and Paper
Deep Segment Hash Learning for Music Generation

May 30, 2018
Kevin Joslyn, Naifan Zhuang, Kien A. Hua

Music generation research has grown in popularity over the past decade, thanks to the deep learning revolution that has redefined the landscape of artificial intelligence. In this paper, we propose a novel approach to music generation inspired by musical segment concatenation methods and hash learning algorithms. Given a segment of music, we use a deep recurrent neural network and ranking-based hash learning to assign a forward hash code to the segment to retrieve candidate segments for continuation with matching backward hash codes. The proposed method is thus called Deep Segment Hash Learning (DSHL). To the best of our knowledge, DSHL is the first end-to-end segment hash learning method for music generation, and the first to use pair-wise training with segments of music. We demonstrate that this method is capable of generating music which is both original and enjoyable, and that DSHL offers a promising new direction for music generation research.

* 16 pages, 4 figures 

  Click for Model/Code and Paper
Toward Extractive Summarization of Online Forum Discussions via Hierarchical Attention Networks

May 25, 2018
Sansiri Tarnpradab, Fei Liu, Kien A. Hua

Forum threads are lengthy and rich in content. Concise thread summaries will benefit both newcomers seeking information and those who participate in the discussion. Few studies, however, have examined the task of forum thread summarization. In this work we make the first attempt to adapt the hierarchical attention networks for thread summarization. The model draws on the recent development of neural attention mechanisms to build sentence and thread representations and use them for summarization. Our results indicate that the proposed approach can outperform a range of competitive baselines. Further, a redundancy removal step is crucial for achieving outstanding results.

* 5 pages 

  Click for Model/Code and Paper
Fast RodFIter for Attitude Reconstruction from Inertial Measurements

Aug 11, 2018
Yuanxin Wu, Qi Cai, Tnieu-Kien Truong

Attitude computation is of vital importance for a variety of applications. Based on the functional iteration of the Rodrigues vector integration equation, the RodFIter method can be advantageously applied to analytically reconstruct the attitude from discrete gyroscope measurements over the time interval of interest. It is promising to produce ultra-accurate attitude reconstruction. However, the RodFIter method imposes high computational load and does not lend itself to onboard implementation. In this paper, a fast approach to significantly reduce RodFIter's computation complexity is presented while maintaining almost the same accuracy of attitude reconstruction. It reformulates the Rodrigues vector iterative integration in terms of the Chebyshev polynomial iteration. Due to the excellent property of Chebyshev polynomials, the fast RodFIter is achieved by means of appropriate truncation of Chebyshev polynomials, with provably guaranteed convergence. Moreover, simulation results validate the speed and accuracy of the proposed method.

* IEEE T-AES, 2018 
* 7 figures 

  Click for Model/Code and Paper
Attentional Multilabel Learning over Graphs: A Message Passing Approach

Apr 11, 2018
Kien Do, Truyen Tran, Thin Nguyen, Svetha Venkatesh

We address a largely open problem of multilabel classification over graphs. Unlike traditional vector input, a graph has rich variable-size substructures which are related to the labels in some ways. We believe that uncovering these relations might hold the key to classification performance and explainability. We introduce GAML (Graph Attentional Multi-Label learning), a novel graph neural network that can handle this problem effectively. GAML regards labels as auxiliary nodes and models them in conjunction with the input graph. By applying message passing and attention mechanisms to both the label nodes and the input nodes iteratively, GAML can capture the relations between the labels and the input subgraphs at various resolution scales. Moreover, our model can take advantage of explicit label dependencies. It also scales linearly with the number of labels and graph size thanks to our proposed hierarchical attention. We evaluate GAML on an extensive set of experiments with both graph-structured inputs and classical unstructured inputs. The results show that GAML significantly outperforms other competing methods. Importantly, GAML enables intuitive visualizations for better understanding of the label-substructure relations and explanation of the model behaviors.


  Click for Model/Code and Paper
Outlier Detection on Mixed-Type Data: An Energy-based Approach

Aug 17, 2016
Kien Do, Truyen Tran, Dinh Phung, Svetha Venkatesh

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.


  Click for Model/Code and Paper
Fictitious Play with Time-Invariant Frequency Update for Network Security

Jun 17, 2010
Kien C. Nguyen, Tansu Alpcan, Tamer Başar

We study two-player security games which can be viewed as sequences of nonzero-sum matrix games played by an Attacker and a Defender. The evolution of the game is based on a stochastic fictitious play process, where players do not have access to each other's payoff matrix. Each has to observe the other's actions up to present and plays the action generated based on the best response to these observations. In a regular fictitious play process, each player makes a maximum likelihood estimate of her opponent's mixed strategy, which results in a time-varying update based on the previous estimate and current action. In this paper, we explore an alternative scheme for frequency update, whose mean dynamic is instead time-invariant. We examine convergence properties of the mean dynamic of the fictitious play process with such an update scheme, and establish local stability of the equilibrium point when both players are restricted to two actions. We also propose an adaptive algorithm based on this time-invariant frequency update.

* Proceedings of the 2010 IEEE Multi-Conference on Systems and Control (MSC10), September 2010, Yokohama, Japan 

  Click for Model/Code and Paper
Semantic Correspondence: A Hierarchical Approach

Jun 10, 2018
Akila Pemasiri, Kien Nguyen, Sridha Sridhara, and Clinton Fookes

Establishing semantic correspondence across images when the objects in the images have undergone complex deformations remains a challenging task in the field of computer vision. In this paper, we propose a hierarchical method to tackle this problem by first semantically targeting the foreground objects to localize the search space and then looking deeply into multiple levels of the feature representation to search for point-level correspondence. In contrast to existing approaches, which typically penalize large discrepancies, our approach allows for significant displacements, with the aim to accommodate large deformations of the objects in scene. Localizing the search space by semantically matching object-level correspondence, our method robustly handles large deformations of objects. Representing the target region by concatenated hypercolumn features which take into account the hierarchical levels of the surrounding context, helps to clear the ambiguity to further improve the accuracy. By conducting multiple experiments across scenes with non-rigid objects, we validate the proposed approach, and show that it outperforms the state of the art methods for semantic correspondence establishment.


  Click for Model/Code and Paper
Sparse Over-complete Patch Matching

Jun 09, 2018
Akila Pemasiri, Kien Nguyen, Sridha Sridharan, Clinton Fookes

Image patch matching, which is the process of identifying corresponding patches across images, has been used as a subroutine for many computer vision and image processing tasks. State -of-the-art patch matching techniques take image patches as input to a convolutional neural network to extract the patch features and evaluate their similarity. Our aim in this paper is to improve on the state of the art patch matching techniques by observing the fact that a sparse-overcomplete representation of an image posses statistical properties of natural visual scenes which can be exploited for patch matching. We propose a new paradigm which encodes image patch details by encoding the patch and subsequently using this sparse representation as input to a neural network to compare the patches. As sparse coding is based on a generative model of natural image patches, it can represent the patch in terms of the fundamental visual components from which it has been composed of, leading to similar sparse codes for patches which are built from similar components. Once the sparse coded features are extracted, we employ a fully-connected neural network, which captures the non-linear relationships between features, for comparison. We have evaluated our approach using the Liberty and Notredame subsets of the popular UBC patch dataset and set a new benchmark outperforming all state-of-the-art patch matching techniques for these datasets.


  Click for Model/Code and Paper
Rank Subspace Learning for Compact Hash Codes

Mar 19, 2015
Kai Li, Guojun Qi, Jun Ye, Kien A. Hua

The era of Big Data has spawned unprecedented interests in developing hashing algorithms for efficient storage and fast nearest neighbor search. Most existing work learn hash functions that are numeric quantizations of feature values in projected feature space. In this work, we propose a novel hash learning framework that encodes feature's rank orders instead of numeric values in a number of optimal low-dimensional ranking subspaces. We formulate the ranking subspace learning problem as the optimization of a piece-wise linear convex-concave function and present two versions of our algorithm: one with independent optimization of each hash bit and the other exploiting a sequential learning framework. Our work is a generalization of the Winner-Take-All (WTA) hash family and naturally enjoys all the numeric stability benefits of rank correlation measures while being optimized to achieve high precision at very short code length. We compare with several state-of-the-art hashing algorithms in both supervised and unsupervised domain, showing superior performance in a number of data sets.

* 10 pages 

  Click for Model/Code and Paper