Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.

* GANs, Clustering, Latent Space, Interpolation
Click to Read Paper
In this paper we first present a class of algorithms for training multi-level neural networks with a quadratic cost function one layer at a time starting from the input layer. The algorithm is based on the fact that for any layer to be trained, the effect of a direct connection to an optimized linear output layer can be computed without the connection being made. Thus, starting from the input layer, we can train each layer in succession in isolation from the other layers. Once trained, the weights are kept fixed and the outputs of the trained layer then serve as the inputs to the next layer to be trained. The result is a very fast algorithm. The simplicity of this training arrangement allows the activation function and step size in weight adjustment to be adaptive and self-adjusting. Furthermore, the stability of the training process allows relatively large steps to be taken and thereby achieving in even greater speeds. Finally, in our context configuring the network means determining the number of outputs for each layer. By decomposing the overall cost function into separate components related to approximation and estimation, we obtain an optimization formula for determining the number of outputs for each layer. With the ability to self-configure and set parameters, we now have more than a fast training algorithm, but the ability to build automatically a fully trained deep neural network starting with nothing more than data.

Click to Read Paper
We describe a method for estimating human head pose in a color image that contains enough of information to locate the head silhouette and detect non-trivial color edges of individual facial features. The method works by spotting the human head on an arbitrary background, extracting the head outline, and locating facial features necessary to describe the head orientation in the 3D space. It is robust enough to work with both color and gray-level images featuring quasi-frontal views of a human head under variable lighting conditions.

* This is a master's thesis completed at UMCP in 1998, being published here given enough of the demand on its contents from the Computer Vision R&D community
Click to Read Paper
In general, the best explanation for a given observation makes no promises on how good it is with respect to other alternative explanations. A major deficiency of message-passing schemes for belief revision in Bayesian networks is their inability to generate alternatives beyond the second best. In this paper, we present a general approach based on linear constraint systems that naturally generates alternative explanations in an orderly and highly efficient manner. This approach is then applied to cost-based abduction problems as well as belief revision in Bayesian net works.

* Appears in Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI1991)
Click to Read Paper
Millions of users share their experiences on social media sites, such as Twitter, which in turn generate valuable data for public health monitoring, digital epidemiology, and other analyses of population health at global scale. The first, critical, task for these applications is classifying whether a personal health event was mentioned, which we call the (PHM) problem. This task is challenging for many reasons, including typically short length of social media posts, inventive spelling and lexicons, and figurative language, including hyperbole using diseases like "heart attack" or "cancer" for emphasis, and not as a health self-report. This problem is even more challenging for rarely reported, or frequent but ambiguously expressed conditions, such as "stroke". To address this problem, we propose a general, robust method for detecting PHMs in social media, which we call WESPAD, that combines lexical, syntactic, word embedding-based, and context-based features. WESPAD is able to generalize from few examples by automatically distorting the word embedding space to most effectively detect the true health mentions. Unlike previously proposed state-of-the-art supervised and deep-learning techniques, WESPAD requires relatively little training data, which makes it possible to adapt, with minimal effort, to each new disease and condition. We evaluate WESPAD on both an established publicly available Flu detection benchmark, and on a new dataset that we have constructed with mentions of multiple health conditions. Our experiments show that WESPAD outperforms the baselines and state-of-the-art methods, especially in cases when the number and proportion of true health mentions in the training data is small.

* WWW 2018
Click to Read Paper
One of the roots of evolutionary computation was the idea of Turing about unorganized machines. The goal of this work is the development of foundations for evolutionary computations, connecting Turing's ideas and the contemporary state of art in evolutionary computations. To achieve this goal, we develop a general approach to evolutionary processes in the computational context, building mathematical models of computational systems, functioning of which is based on evolutionary processes, and studying properties of such systems. Operations with evolutionary machines are described and it is explored when definite classes of evolutionary machines are closed with respect to basic operations with these machines. We also study such properties as linguistic and functional equivalence of evolutionary machines and their classes, as well as computational power of evolutionary machines and their classes, comparing of evolutionary machines to conventional automata, such as finite automata or Turing machines.

Click to Read Paper
Generating semantic lexicons semi-automatically could be a great time saver, relative to creating them by hand. In this paper, we present an algorithm for extracting potential entries for a category from an on-line corpus, based upon a small set of exemplars. Our algorithm finds more correct terms and fewer incorrect ones than previous work in this area. Additionally, the entries that are generated potentially provide broader coverage of the category than would occur to an individual coding them by hand. Our algorithm finds many terms not included within Wordnet (many more than previous algorithms), and could be viewed as an ``enhancer'' of existing broad-coverage resources.

* Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL), 1998, pages 1110-1116
* 7 pages, 1 figure, 5 tables
Click to Read Paper
The early detection, diagnosis and monitoring of liver cancer progression can be achieved with the precise delineation of metastatic tumours. However, accurate automated segmentation remains challenging due to the presence of noise, inhomogeneity and the high appearance variability of malignant tissue. In this paper, we propose an unsupervised metastatic liver tumour segmentation framework using a machine learning approach based on discriminant Grassmannian manifolds which learns the appearance of tumours with respect to normal tissue. First, the framework learns within-class and between-class similarity distributions from a training set of images to discover the optimal manifold discrimination between normal and pathological tissue in the liver. Second, a conditional optimisation scheme computes nonlocal pairwise as well as pattern-based clique potentials from the manifold subspace to recognise regions with similar labelings and to incorporate global consistency in the segmentation process. The proposed framework was validated on a clinical database of 43 CT images from patients with metastatic liver cancer. Compared to state-of-the-art methods, our method achieves a better performance on two separate datasets of metastatic liver tumours from different clinical sites, yielding an overall mean Dice similarity coefficient of 90.7 +/- 2.4 in over 50 tumours with an average volume of 27.3 mm3.

* Physics in Medicine and Biology 60 (2015)
Click to Read Paper
We present a new algorithm for finding maximum a-posterior) (MAP) assignments of values to belief networks. The belief network is compiled into a network consisting only of nodes with boolean (i.e. only 0 or 1) conditional probabilities. The MAP assignment is then found using a best-first search on the resulting network. We argue that, as one would anticipate, the algorithm is exponential for the general case, but only linear in the size of the network for poly trees.

* Appears in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI1990)
Click to Read Paper
A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the network before updating. We consider and analyze a training procedure, Decoupled Greedy Learning (DGL), that addresses this problem more effectively and at scales beyond those of previous solutions. It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization in layers. We show theoretically and empirically that this approach converges. In addition, we empirically find that it can lead to better generalization than sequential greedy optimization and even standard end-to-end back-propagation. We show that an extension of this approach to asynchronous settings, where modules can operate with large communication delays, is possible with the use of a replay buffer. We demonstrate the effectiveness of DGL on the CIFAR-10 datasets against alternatives and on the large-scale ImageNet dataset, where we are able to effectively train VGG and ResNet-152 models.

* 14 pages
Click to Read Paper
In natural language processing tasks performance of the models is often measured with some non-differentiable metric, such as BLEU score. To use efficient gradient-based methods for optimization, it is a common workaround to optimize some surrogate loss function. This approach is effective if optimization of such loss also results in improving target metric. The corresponding problem is referred to as loss-evaluation mismatch. In the present work we propose a method for calculation of differentiable lower bound of expected BLEU score that does not involve computationally expensive sampling procedure such as the one required when using REINFORCE rule from reinforcement learning (RL) framework.

* Presented at NIPS 2017 Workshop on Conversational AI: Today's Practice and Tomorrow's Potential
Click to Read Paper
In this paper, we explore the ways to improve POS-tagging using various types of auxiliary losses and different word representations. As a baseline, we utilized a BiLSTM tagger, which is able to achieve state-of-the-art results on the sequence labelling tasks. We developed a new method for character-level word representation using feedforward neural network. Such representation gave us better results in terms of speed and performance of the model. We also applied a novel technique of pretraining such word representations with existing word vectors. Finally, we designed a new variant of auxiliary loss for sequence labelling tasks: an additional prediction of the neighbour labels. Such loss forces a model to learn the dependencies in-side a sequence of labels and accelerates the process of training. We test these methods on English and Russian languages.

* Computational Linguistics and Intellectual Technologies, Papers from the Annual International Conference "Dialogue" (2018) Issue 17, 14-27
Click to Read Paper
We use the scattering network as a generic and fixed ini-tialization of the first layers of a supervised hybrid deep network. We show that early layers do not necessarily need to be learned, providing the best results to-date with pre-defined representations while being competitive with Deep CNNs. Using a shallow cascade of 1 x 1 convolutions, which encodes scattering coefficients that correspond to spatial windows of very small sizes, permits to obtain AlexNet accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding explicitly learns invariance w.r.t. rotations. Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers. We also find that hybrid architectures can yield excellent performance in the small sample regime, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. We demonstrate this on subsets of the CIFAR-10 dataset and on the STL-10 dataset.

Click to Read Paper
Independence-based (IB) assignments to Bayesian belief networks were originally proposed as abductive explanations. IB assignments assign fewer variables in abductive explanations than do schemes assigning values to all evidentially supported variables. We use IB assignments to approximate marginal probabilities in Bayesian belief networks. Recent work in belief updating for Bayes networks attempts to approximate posterior probabilities by finding a small number of the highest probability complete (or perhaps evidentially supported) assignments. Under certain assumptions, the probability mass in the union of these assignments is sufficient to obtain a good approximation. Such methods are especially useful for highly-connected networks, where the maximum clique size or the cutset size make the standard algorithms intractable. Since IB assignments contain fewer assigned variables, the probability mass in each assignment is greater than in the respective complete assignment. Thus, fewer IB assignments are sufficient, and a good approximation can be obtained more efficiently. IB assignments can be used for efficiently approximating posterior node probabilities even in cases which do not obey the rather strict skewness assumptions used in previous research. Two algorithms for finding the high probability IB assignments are suggested: one by doing a best-first heuristic search, and another by special-purpose integer linear programming. Experimental results show that this approach is feasible for highly connected belief networks.

* Appears in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI1994)
Click to Read Paper
We generalize the word analogy task across languages, to provide a new intrinsic evaluation method for cross-lingual semantic spaces. We experiment with six languages within different language families, including English, German, Spanish, Italian, Czech, and Croatian. State-of-the-art monolingual semantic spaces are transformed into a shared space using dictionaries of word translations. We compare several linear transformations and rank them for experiments with monolingual (no transformation), bilingual (one semantic space is transformed to another), and multilingual (all semantic spaces are transformed onto English space) versions of semantic spaces. We show that tested linear transformations preserve relationships between words (word analogies) and lead to impressive results. We achieve average accuracy of 51.1%, 43.1%, and 38.2% for monolingual, bilingual, and multilingual semantic spaces, respectively.

* 11 pages. arXiv admin note: text overlap with arXiv:1807.04172
Click to Read Paper
Background modeling techniques are used for moving object detection in video. Many algorithms exist in the field of object detection with different purposes. In this paper, we propose an improvement of moving object detection based on codebook segmentation. We associate the original codebook algorithm with an edge detection algorithm. Our goal is to prove the efficiency of using an edge detection algorithm with a background modeling algorithm. Throughout our study, we compared the quality of the moving object detection when codebook segmentation algorithm is associated with some standard edge detectors. In each case, we use frame-based metrics for the evaluation of the detection. The different results are presented and analyzed.

* to appear in the 10th International Conference on Signal Image Technology & Internet Based Systems, 2014
Click to Read Paper
In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the rules are said to be safe. In this work, we propose a unifying framework for generalized linear models regularized with standard sparsity enforcing penalties such as $\ell_1$ or $\ell_1/\ell_2$ norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods. Applied to many standard learning tasks, Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc., we report significant speed-ups compared to previously proposed safe rules on all tested data sets.

Click to Read Paper
This paper proposes a novel framework for detecting redundancy in supervised sentence categorisation. Unlike traditional singleton neural network, our model incorporates character-aware convolutional neural network (Char-CNN) with character-aware recurrent neural network (Char-RNN) to form a convolutional recurrent neural network (CRNN). Our model benefits from Char-CNN in that only salient features are selected and fed into the integrated Char-RNN. Char-RNN effectively learns long sequence semantics via sophisticated update mechanism. We compare our framework against the state-of-the-art text classification algorithms on four popular benchmarking corpus. For instance, our model achieves competing precision rate, recall ratio, and F1 score on the Google-news data-set. For twenty-news-groups data stream, our algorithm obtains the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, our framework obtains the best F1 score and almost equivalent precision rate and recall ratio over the top competitor. For the question classification collection, CRNN produces the optimal recall rate and F1 score and comparable precision rate. We also analyse three different RNN hidden recurrent cells' impact on performance and their runtime efficiency. We observe that MGU achieves the optimal runtime and comparable performance against GRU and LSTM. For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec embeddings and report their performance differences.

* Conference paper accepted at IEEE SMARTCOMP 2017, Hong Kong
Click to Read Paper
Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News.

Click to Read Paper
In high dimensional settings, sparse structures are crucial for efficiency, either in term of memory, computation or performance. In some contexts, it is natural to handle more refined structures than pure sparsity, such as for instance group sparsity. Sparse-Group Lasso has recently been introduced in the context of linear regression to enforce sparsity both at the feature level and at the group level. We adapt to the case of Sparse-Group Lasso recent safe screening rules that discard early in the solver irrelevant features/groups. Such rules have led to important speed-ups for a wide range of iterative methods. Thanks to dual gap computations, we provide new safe screening rules for Sparse-Group Lasso and show significant gains in term of computing time for a coordinate descent implementation.

Click to Read Paper