Models, code, and papers for "David R":

Categorical distributions are ubiquitous in machine learning, e.g., in classification, language models, and recommendation systems. However, when the number of possible outcomes is very large, using categorical distributions becomes computationally expensive, as the complexity scales linearly with the number of outcomes. To address this problem, we propose augment and reduce (A&R), a method to alleviate the computational complexity. A&R uses two ideas: latent variable augmentation and stochastic variational inference. It maximizes a lower bound on the marginal likelihood of the data. Unlike existing methods which are specific to softmax, A&R is more general and is amenable to other categorical models, such as multinomial probit. On several large-scale classification problems, we show that A&R provides a tighter bound on the marginal likelihood and has better predictive performance than existing approaches.

The topic of this paper is a novel Bayesian continuous-basis field representation and inference framework. Within this paper several problems are solved: The maximally informative inference of continuous-basis fields, that is where the basis for the field is itself a continuous object and not representable in a finite manner; the tradeoff between accuracy of representation in terms of information learned, and memory or storage capacity in bits; the approximation of probability distributions so that a maximal amount of information about the object being inferred is preserved; an information theoretic justification for multigrid methodology. The maximally informative field inference framework is described in full generality and denoted the Generalized Kalman Filter. The Generalized Kalman Filter allows the update of field knowledge from previous knowledge at any scale, and new data, to new knowledge at any other scale. An application example instance, the inference of continuous surfaces from measurements (for example, camera image data), is presented.

In this work, we propose to detect the iris and periocular regions simultaneously using coarse annotations and two well-known object detectors: YOLOv2 and Faster R-CNN. We believe coarse annotations can be used in recognition systems based on the iris and periocular regions, given the much smaller engineering effort required to manually annotate the training images. We manually made coarse annotations of the iris and periocular regions (122K images from the visible (VIS) spectrum and 38K images from the near-infrared (NIR) spectrum). The iris annotations in the NIR databases were generated semi-automatically by first applying an iris segmentation CNN and then performing a manual inspection. These annotations were made for 11 well-known public databases (3 NIR and 8 VIS) designed for the iris-based recognition problem and are publicly available to the research community. Experimenting our proposal on these databases, we highlight two results. First, the Faster R-CNN + Feature Pyramid Network (FPN) model reported an Intersection over Union (IoU) higher than YOLOv2 (91.86% vs 85.30%). Second, the detection of the iris and periocular regions being performed simultaneously is as accurate as performed separately, but with a lower computational cost, i.e., two tasks were carried out at the cost of one.

WhyWhere is a new ecological niche modeling (ENM) algorithm for mapping and explaining the distribution of species. The algorithm uses image processing methods to efficiently sift through large amounts of data to find the few variables that best predict species occurrence. The purpose of this paper is to describe and justify the main parameterizations and to show preliminary success at rapidly providing accurate, scalable, and simple ENMs. Preliminary results for 6 species of plants and animals in different regions indicate a significant (p<0.01) 14% increase in accuracy over the GARP algorithm using models with few, typically two, variables. The increase is attributed to access to additional data, particularly monthly vs. annual climate averages. WhyWhere is also 6 times faster than GARP on large data sets. A data mining based approach with transparent access to remote data archives is a new paradigm for ENM, particularly suited to finding correlates in large databases of fine resolution surfaces. Software for WhyWhere is freely available, both as a service and in a desktop downloadable form from the web site http://biodi.sdsc.edu/ww_home.html.

Given a collection of images and spoken audio captions, we present a method for discovering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions. For example, our model is able to detect spoken instances of the word 'lighthouse' within an utterance and associate them with image regions containing lighthouses. We do not use any form of conventional automatic speech recognition, nor do we use any text transcriptions or conventional linguistic annotations. Our model effectively implements a form of spoken language acquisition, in which the computer learns not only to recognize word categories by sound, but also to enrich the words it learns with semantics by grounding them in images.

This paper focuses on managing the cost of deliberation before action. In many problems, the overall quality of the solution reflects costs incurred and resources consumed in deliberation as well as the cost and benefit of execution, when both the resource consumption in deliberation phase, and the costs in deliberation and execution are uncertain and may be described by probability distribution functions. A feasible (in terms of resource consumption) strategy that minimizes the expected total cost is termed computationally-optimal. For a situation with several independent, uninterruptible methods to solve the problem, we develop a pseudopolynomial-time algorithm to construct generate-and-test computationally optimal strategy. We show this strategy-construction problem to be NP-complete, and apply Bellman's Optimality Principle to solve it efficiently.

This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.

We consider the problem of optimising functions in the Reproducing kernel Hilbert space (RKHS) of a Mat\'ern family kernel with parameter $\nu$ over the domain $[0,1]^d$ under noisy bandit feedback. Our contribution, the $\pi$-GP-UCB algorithm, is the first practical approach with guaranteed sublinear regret for all $\nu>1$ and $d \geq 1$. Empirical validation suggests better performance and drastically improved computational scalablity compared with its predecessor, Improved GP-UCB.

We present a novel method for solving Canonical Correlation Analysis (CCA) in a sparse convex framework using a least squares approach. The presented method focuses on the scenario when one is interested in (or limited to) a primal representation for the first view while having a dual representation for the second view. Sparse CCA (SCCA) minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. The method is demonstrated on two paired corpuses of English-French and English-Spanish for mate-retrieval. We are able to observe, in the mate-retreival, that when the number of the original features is large SCCA outperforms Kernel CCA (KCCA), learning the common semantic space from a sparse set of features.

We show that in modeling social interaction, particularly dialogue, the attitude of obligation can be a useful adjunct to the popularly considered attitudes of belief, goal, and intention and their mutual and shared counterparts. In particular, we show how discourse obligations can be used to account in a natural manner for the connection between a question and its answer in dialogue and how obligations can be used along with other parts of the discourse context to extend the coverage of a dialogue system.

We present a new video compression framework (ViSTRA2) which exploits adaptation of spatial resolution and effective bit depth, down-sampling these parameters at the encoder based on perceptual criteria, and up-sampling at the decoder using a deep convolution neural network. ViSTRA2 has been integrated with the reference software of both the HEVC (HM 16.20) and VVC (VTM 4.01), and evaluated under the Joint Video Exploration Team Common Test Conditions using the Random Access configuration. Our results show consistent and significant compression gains against HM and VVC based on Bj{\o}negaard Delta measurements, with average BD-rate savings of 12.6% (PSNR) and 19.5% (VMAF) over HM and 5.5% (PSNR) and 8.6% (VMAF) over VTM.

Recent works have highlighted the strengths of the Transformer architecture for dealing with sequence tasks. At the same time, neural architecture search has advanced to the point where it can outperform human-designed models. The goal of this work is to use architecture search to find a better Transformer architecture. We first construct a large search space inspired by the recent advances in feed-forward sequential models and then run evolutionary architecture search, seeding our initial population with the Transformer. To effectively run this search on the computationally expensive WMT 2014 English-German translation task, we develop the progressive dynamic hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments - the Evolved Transformer - demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At big model size, the Evolved Transformer is twice as efficient as the Transformer in FLOPS without loss in quality. At a much smaller - mobile-friendly - model size of ~7M parameters, the Evolved Transformer outperforms the Transformer by 0.7 BLEU on WMT'14 English-German.

Accurately predicting the future health of batteries is necessary to ensure reliable operation, minimise maintenance costs, and calculate the value of energy storage investments. The complex nature of degradation renders data-driven approaches a promising alternative to mechanistic modelling. This study predicts the changes in battery capacity over time using a Bayesian non-parametric approach based on Gaussian process regression. These changes can be integrated against an arbitrary input sequence to predict capacity fade in a variety of usage scenarios, forming a generalised health model. The approach naturally incorporates varying current, voltage and temperature inputs, crucial for enabling real world application. A key innovation is the feature selection step, where arbitrary length current, voltage and temperature measurement vectors are mapped to fixed size feature vectors, enabling them to be efficiently used as exogenous variables. The approach is demonstrated on the open-source NASA Randomised Battery Usage Dataset, with data of 26 cells aged under randomized operational conditions. Using half of the cells for training, and half for validation, the method is shown to accurately predict non-linear capacity fade, with a best case normalised root mean square error of 4.3%, including accurate estimation of prediction uncertainty.

Accurately predicting the future capacity and remaining useful life of batteries is necessary to ensure reliable system operation and to minimise maintenance costs. The complex nature of battery degradation has meant that mechanistic modelling of capacity fade has thus far remained intractable; however, with the advent of cloud-connected devices, data from cells in various applications is becoming increasingly available, and the feasibility of data-driven methods for battery prognostics is increasing. Here we propose Gaussian process (GP) regression for forecasting battery state of health, and highlight various advantages of GPs over other data-driven and mechanistic approaches. GPs are a type of Bayesian non-parametric method, and hence can model complex systems whilst handling uncertainty in a principled manner. Prior information can be exploited by GPs in a variety of ways: explicit mean functions can be used if the functional form of the underlying degradation model is available, and multiple-output GPs can effectively exploit correlations between data from different cells. We demonstrate the predictive capability of GPs for short-term and long-term (remaining useful life) forecasting on a selection of capacity vs. cycle datasets from lithium-ion cells.

This work presents a novel objective function for the unsupervised training of neural network sentence encoders. It exploits signals from paragraph-level discourse coherence to train these models to understand text. Our objective is purely discriminative, allowing us to train models many times faster than was possible under prior methods, and it yields models which perform well in extrinsic evaluations.

We present a hybrid continuum-atomistic scheme which combines molecular dynamics (MD) simulations with on-the-fly machine learning techniques for the accurate and efficient prediction of multiscale fluidic systems. By using a Gaussian process as a surrogate model for the computationally expensive MD simulations, we use Bayesian inference to predict the system behaviour at the atomistic scale, purely by consideration of the macroscopic inputs and outputs. Whenever the uncertainty of this prediction is greater than a predetermined acceptable threshold, a new MD simulation is performed to continually augment the database, which is never required to be complete. This provides a substantial enhancement to the current generation of hybrid methods, which often require many similar atomistic simulations to be performed, discarding information after it is used once. We apply our hybrid scheme to nano-confined unsteady flow through a high-aspect-ratio converging-diverging channel, and make comparisons between the new scheme and full MD simulations for a range of uncertainty thresholds and initial databases. For low thresholds, our hybrid solution is highly accurate\,---\,within the thermal noise of a full MD simulation. As the uncertainty threshold is raised, the accuracy of our scheme decreases and the computational speed-up increases (relative to a full MD simulation), enabling the compromise between precision and efficiency to be tuned. The speed-up of our hybrid solution ranges from an order of magnitude, with no initial database, to cases where an extensive initial database ensures no new MD simulations are required.

After experimentation with other designs, the major search engines converged on the weighted, generalized second-price auction (wGSP) for selling keyword advertisements. Notably, this convergence occurred before position auctions were well understood (or, indeed, widely studied) theoretically. While much progress has been made since, theoretical analysis is still not able to settle the question of why search engines found wGSP preferable to other position auctions. We approach this question in a new way, adopting a new analytical paradigm we dub "computational mechanism analysis." By sampling position auction games from a given distribution, encoding them in a computationally efficient representation language, computing their Nash equilibria, and then calculating economic quantities of interest, we can quantitatively answer questions that theoretical methods have not. We considered seven widely studied valuation models from the literature and three position auction variants (generalized first price, unweighted generalized second price, and wGSP). We found that wGSP consistently showed the best ads of any position auction, measured both by social welfare and by relevance (expected number of clicks). Even in models where wGSP was already known to have bad worse-case efficiency, we found that it almost always performed well on average. In contrast, we found that revenue was extremely variable across auction mechanisms, and was highly sensitive to equilibrium selection, the preference model, and the valuation distribution.

Crowdsourcing systems, in which numerous tasks are electronically distributed to numerous "information piece-workers", have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all such systems must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in an appropriate manner, e.g. majority voting. In this paper, we consider a general model of such crowdsourcing tasks and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm, inspired by belief propagation and low-rank matrix approximation, significantly outperforms majority voting and, in fact, is optimal through comparison to an oracle that knows the reliability of every worker. Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks. By adaptively deciding which questions to ask to the next arriving worker, one might hope to reduce uncertainty more efficiently. We show that, perhaps surprisingly, the minimum price necessary to achieve a target reliability scales in the same manner under both adaptive and non-adaptive scenarios. Hence, our non-adaptive approach is order-optimal under both scenarios. This strongly relies on the fact that workers are fleeting and can not be exploited. Therefore, architecturally, our results suggest that building a reliable worker-reputation system is essential to fully harnessing the potential of adaptive designs.

We present a Gaussian Variational Inference (GVI) technique that can be applied to large-scale nonlinear batch state estimation problems. The main contribution is to show how to fit the best Gaussian to the posterior efficiently by exploiting factorization of the joint likelihood of the state and data, as is common in practical problems. The proposed Exactly Sparse Gaussian Variational Inference (ESGVI) technique stores the inverse covariance matrix, which is typically very sparse (e.g., block-tridiagonal for classic state estimation). We show that the only blocks of the (dense) covariance matrix that are required during the calculations correspond to the non-zero blocks of the inverse covariance matrix, and further show how to calculate these blocks efficiently in the general GVI problem. ESGVI operates iteratively, and while we can use analytical derivatives at each iteration, Gaussian cubature can be substituted, thereby producing an efficient derivative-free batch formulation. ESGVI simplifies to precisely the Rauch-Tung-Striebel (RTS) smoother in the batch linear estimation case, but goes beyond the 'extended' RTS smoother in the nonlinear case since it finds the best-fit Gaussian, not the Maximum A Posteriori (MAP) point solution. We demonstrate the technique on controlled simulation problems and a batch nonlinear Simultaneous Localization and Mapping (SLAM) problem with an experimental dataset.