Research papers and code for "Eugene Ch":
In this paper we first present a class of algorithms for training multi-level neural networks with a quadratic cost function one layer at a time starting from the input layer. The algorithm is based on the fact that for any layer to be trained, the effect of a direct connection to an optimized linear output layer can be computed without the connection being made. Thus, starting from the input layer, we can train each layer in succession in isolation from the other layers. Once trained, the weights are kept fixed and the outputs of the trained layer then serve as the inputs to the next layer to be trained. The result is a very fast algorithm. The simplicity of this training arrangement allows the activation function and step size in weight adjustment to be adaptive and self-adjusting. Furthermore, the stability of the training process allows relatively large steps to be taken and thereby achieving in even greater speeds. Finally, in our context configuring the network means determining the number of outputs for each layer. By decomposing the overall cost function into separate components related to approximation and estimation, we obtain an optimization formula for determining the number of outputs for each layer. With the ability to self-configure and set parameters, we now have more than a fast training algorithm, but the ability to build automatically a fully trained deep neural network starting with nothing more than data.

Click to Read Paper and Get Code
The purpose of this report is in examining the generalization performance of Support Vector Machines (SVM) as a tool for pattern recognition and object classification. The work is motivated by the growing popularity of the method that is claimed to guarantee a good generalization performance for the task in hand. The method is implemented in MATLAB. SVMs based on various kernels are tested for classifying data from various domains.

* A short (6 page) report on evaluation of the SVM as a pattern classification tool, as of 1999
Click to Read Paper and Get Code
This report explores the latest advances in the field of digital document recognition. With the focus on printed document imagery, we discuss the major developments in optical character recognition (OCR) and document image enhancement/restoration in application to Latin and non-Latin scripts. In addition, we review and discuss the available technologies for hand-written document recognition. In this report, we also provide some company-accumulated benchmark results on available OCR engines.

* Technical report surveying OCR/ICR and document understanding methods as of 2004.It contains 38 pages, numerous figures, 93 references, and provides a table of contents
Click to Read Paper and Get Code
In general, the best explanation for a given observation makes no promises on how good it is with respect to other alternative explanations. A major deficiency of message-passing schemes for belief revision in Bayesian networks is their inability to generate alternatives beyond the second best. In this paper, we present a general approach based on linear constraint systems that naturally generates alternative explanations in an orderly and highly efficient manner. This approach is then applied to cost-based abduction problems as well as belief revision in Bayesian net works.

* Appears in Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI1991)
Click to Read Paper and Get Code
A plethora of words are used to describe the spectrum of human emotions, but how many emotions are there really, and how do they interact? Over the past few decades, several theories of emotion have been proposed, each based around the existence of a set of 'basic emotions', and each supported by an extensive variety of research including studies in facial expression, ethology, neurology and physiology. Here we present research based on a theory that people transmit their understanding of emotions through the language they use surrounding emotion keywords. Using a labelled corpus of over 21,000 tweets, six of the basic emotion sets proposed in existing literature were analysed using Latent Semantic Clustering (LSC), evaluating the distinctiveness of the semantic meaning attached to the emotional label. We hypothesise that the more distinct the language is used to express a certain emotion, then the more distinct the perception (including proprioception) of that emotion is, and thus more 'basic'. This allows us to select the dimensions best representing the entire spectrum of emotion. We find that Ekman's set, arguably the most frequently used for classifying emotions, is in fact the most semantically distinct overall. Next, taking all analysed (that is, previously proposed) emotion terms into account, we determine the optimal semantically irreducible basic emotion set using an iterative LSC algorithm. Our newly-derived set (Accepting, Ashamed, Contempt, Interested, Joyful, Pleased, Sleepy, Stressed) generates a 6.1% increase in distinctiveness over Ekman's set (Angry, Disgusted, Joyful, Sad, Scared). We also demonstrate how using LSC data can help visualise emotions. We introduce the concept of an Emotion Profile and briefly analyse compound emotions both visually and mathematically.

* University of Bath BSc(Hons) Computer Science Dissertation, 105 Pages
Click to Read Paper and Get Code
Differential privacy (DP) is a popular mechanism for training machine learning models with bounded leakage about the presence of specific points in the training data. The cost of differential privacy is a reduction in the model's accuracy. We demonstrate that this cost is not borne equally: accuracy of DP models drops much more for the underrepresented classes and subgroups. For example, a DP gender classification model exhibits much lower accuracy for black faces than for white faces. Critically, this gap is bigger in the DP model than in the non-DP model, i.e., if the original model is unfair, the unfairness becomes worse once DP is applied. We demonstrate this effect for a variety of tasks and models, including sentiment analysis of text and image classification. We then explain why DP training mechanisms such as gradient clipping and noise addition have disproportionate effect on the underrepresented and more complex subgroups, resulting in a disparate reduction of model accuracy.

Click to Read Paper and Get Code
Conventional prior for Variational Auto-Encoder (VAE) is a Gaussian distribution. Recent works demonstrated that choice of prior distribution affects learning capacity of VAE models. We propose a general technique (embedding-reparameterization procedure, or ER) for introducing arbitrary manifold-valued variables in VAE model. We compare our technique with a conventional VAE on a toy benchmark problem. This is work in progress.

* Presented at Bayesian Deep Learning workshop (NeurIPS 2018)
Click to Read Paper and Get Code
Millions of users share their experiences on social media sites, such as Twitter, which in turn generate valuable data for public health monitoring, digital epidemiology, and other analyses of population health at global scale. The first, critical, task for these applications is classifying whether a personal health event was mentioned, which we call the (PHM) problem. This task is challenging for many reasons, including typically short length of social media posts, inventive spelling and lexicons, and figurative language, including hyperbole using diseases like "heart attack" or "cancer" for emphasis, and not as a health self-report. This problem is even more challenging for rarely reported, or frequent but ambiguously expressed conditions, such as "stroke". To address this problem, we propose a general, robust method for detecting PHMs in social media, which we call WESPAD, that combines lexical, syntactic, word embedding-based, and context-based features. WESPAD is able to generalize from few examples by automatically distorting the word embedding space to most effectively detect the true health mentions. Unlike previously proposed state-of-the-art supervised and deep-learning techniques, WESPAD requires relatively little training data, which makes it possible to adapt, with minimal effort, to each new disease and condition. We evaluate WESPAD on both an established publicly available Flu detection benchmark, and on a new dataset that we have constructed with mentions of multiple health conditions. Our experiments show that WESPAD outperforms the baselines and state-of-the-art methods, especially in cases when the number and proportion of true health mentions in the training data is small.

* WWW 2018
Click to Read Paper and Get Code
One of the roots of evolutionary computation was the idea of Turing about unorganized machines. The goal of this work is the development of foundations for evolutionary computations, connecting Turing's ideas and the contemporary state of art in evolutionary computations. To achieve this goal, we develop a general approach to evolutionary processes in the computational context, building mathematical models of computational systems, functioning of which is based on evolutionary processes, and studying properties of such systems. Operations with evolutionary machines are described and it is explored when definite classes of evolutionary machines are closed with respect to basic operations with these machines. We also study such properties as linguistic and functional equivalence of evolutionary machines and their classes, as well as computational power of evolutionary machines and their classes, comparing of evolutionary machines to conventional automata, such as finite automata or Turing machines.

Click to Read Paper and Get Code
Useless paths are a chronic problem for marker-passing techniques. We use a probabilistic analysis to justify a method for quickly identifying and rejecting useless paths. Using the same analysis, we identify key conditions and assumptions necessary for marker-passing to perform well.

* Appears in Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI1991)
Click to Read Paper and Get Code
Evolutionary processes proved very useful for solving optimization problems. In this work, we build a formalization of the notion of cooperation and competition of multiple systems working toward a common optimization goal of the population using evolutionary computation techniques. It is justified that evolutionary algorithms are more expressive than conventional recursive algorithms. Three subclasses of evolutionary algorithms are proposed here: bounded finite, unbounded finite and infinite types. Some results on completeness, optimality and search decidability for the above classes are presented. A natural extension of Evolutionary Turing Machine model developed in this paper allows one to mathematically represent and study properties of cooperation and competition in a population of optimized species.

Click to Read Paper and Get Code
Locating the source of odor in a turbulent environment - a common behavior for living organisms - is non-trivial because of the random nature of mixing. Here we analyze the statistical physics aspects of the problem and propose an efficient strategy for olfactory search which can work in turbulent plumes. The algorithm combines the maximum likelihood inference of the source position with an active search. Our approach provides the theoretical basis for the design of olfactory robots and the quantitative tools for the analysis of the observed olfactory search behavior of living creatures (e.g. odor modulated optomotor anemotaxis of moth)

* 6 pages, 4 figures
Click to Read Paper and Get Code
Very little attention has been paid to the comparison of efficiency between high accuracy statistical parsers. This paper proposes one machine-independent metric that is general enough to allow comparisons across very different parsing architectures. This metric, which we call ``events considered'', measures the number of ``events'', however they are defined for a particular parser, for which a probability must be calculated, in order to find the parse. It is applicable to single-pass or multi-stage parsers. We discuss the advantages of the metric, and demonstrate its usefulness by using it to compare two parsers which differ in several fundamental ways.

* Proceedings of the COLING 2000 Workshop on Efficiency in Large-Scale Parsing Systems, 2000, pages 29-36
* 8 pages, 4 figures, 2 tables
Click to Read Paper and Get Code
Generating semantic lexicons semi-automatically could be a great time saver, relative to creating them by hand. In this paper, we present an algorithm for extracting potential entries for a category from an on-line corpus, based upon a small set of exemplars. Our algorithm finds more correct terms and fewer incorrect ones than previous work in this area. Additionally, the entries that are generated potentially provide broader coverage of the category than would occur to an individual coding them by hand. Our algorithm finds many terms not included within Wordnet (many more than previous algorithms), and could be viewed as an ``enhancer'' of existing broad-coverage resources.

* Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL), 1998, pages 1110-1116
* 7 pages, 1 figure, 5 tables
Click to Read Paper and Get Code
The early detection, diagnosis and monitoring of liver cancer progression can be achieved with the precise delineation of metastatic tumours. However, accurate automated segmentation remains challenging due to the presence of noise, inhomogeneity and the high appearance variability of malignant tissue. In this paper, we propose an unsupervised metastatic liver tumour segmentation framework using a machine learning approach based on discriminant Grassmannian manifolds which learns the appearance of tumours with respect to normal tissue. First, the framework learns within-class and between-class similarity distributions from a training set of images to discover the optimal manifold discrimination between normal and pathological tissue in the liver. Second, a conditional optimisation scheme computes nonlocal pairwise as well as pattern-based clique potentials from the manifold subspace to recognise regions with similar labelings and to incorporate global consistency in the segmentation process. The proposed framework was validated on a clinical database of 43 CT images from patients with metastatic liver cancer. Compared to state-of-the-art methods, our method achieves a better performance on two separate datasets of metastatic liver tumours from different clinical sites, yielding an overall mean Dice similarity coefficient of 90.7 +/- 2.4 in over 50 tumours with an average volume of 27.3 mm3.

* Physics in Medicine and Biology 60 (2015)
Click to Read Paper and Get Code
Plan recognition does not work the same way in stories and in "real life" (people tend to jump to conclusions more in stories). We present a theory of this, for the particular case of how objects in stories (or in life) influence plan recognition decisions. We provide a Bayesian network formalization of a simple first-order theory of plans, and show how a particular network parameter seems to govern the difference between "life-like" and "story-like" response. We then show why this parameter would be influenced (in the desired way) by a model of speaker (or author) topic selection which assumes that facts in stories are typically "relevant".

* Appears in Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence (UAI1989)
Click to Read Paper and Get Code
We present a new algorithm for finding maximum a-posterior) (MAP) assignments of values to belief networks. The belief network is compiled into a network consisting only of nodes with boolean (i.e. only 0 or 1) conditional probabilities. The MAP assignment is then found using a best-first search on the resulting network. We argue that, as one would anticipate, the algorithm is exponential for the general case, but only linear in the size of the network for poly trees.

* Appears in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI1990)
Click to Read Paper and Get Code
We describe a method for incrementally constructing belief networks. We have developed a network-construction language similar to a forward-chaining language using data dependencies, but with additional features for specifying distributions. Using this language, we can define parameterized classes of probabilistic models. These parameterized models make it possible to apply probabilistic reasoning to problems for which it is impractical to have a single large static model.

* Appears in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI1990)
Click to Read Paper and Get Code
A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the network before updating. We consider and analyze a training procedure, Decoupled Greedy Learning (DGL), that addresses this problem more effectively and at scales beyond those of previous solutions. It is based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification. We consider an optimization of this objective that permits us to decouple the layer training, allowing for layers or modules in networks to be trained with a potentially linear parallelization in layers. We show theoretically and empirically that this approach converges. In addition, we empirically find that it can lead to better generalization than sequential greedy optimization and even standard end-to-end back-propagation. We show that an extension of this approach to asynchronous settings, where modules can operate with large communication delays, is possible with the use of a replay buffer. We demonstrate the effectiveness of DGL on the CIFAR-10 datasets against alternatives and on the large-scale ImageNet dataset, where we are able to effectively train VGG and ResNet-152 models.

* 14 pages
Click to Read Paper and Get Code
Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize than their deep counterparts, but lack their representational power. Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks. Contrary to previous approaches using shallow networks, we focus on problems where deep learning is reported as critical for success. We thus study CNNs on image recognition tasks using the large-scale ImageNet dataset and the CIFAR-10 dataset. Using a simple set of ideas for architecture and training we find that solving sequential 1-hidden-layer auxiliary problems leads to a CNN that exceeds AlexNet performance on ImageNet. Extending our training methodology to construct individual layers by solving 2-and-3-hidden layer auxiliary problems, we obtain an 11-layer network that exceeds VGG-11 on ImageNet obtaining 89.8% top-5 single crop. To our knowledge, this is the first competitive alternative to end-to-end training of CNNs that can scale to ImageNet. We conduct a wide range of experiments to study the properties this induces on the intermediate layers.

Click to Read Paper and Get Code