Models, code, and papers for "Peter C. Ma":

Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement is extremely challenging even for domain experts. Most existing automated device placement approaches are impractical due to the significant amount of compute required and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an efficient end-to-end method based on a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs. On a diverse set of representative deep learning models, including Inception-v3, AmoebaNet, Transformer-XL, and WaveNet, our method on average achieves 16% improvement over human experts and 9.2% improvement over the prior art with 15 times faster convergence. To further reduce the computation cost, we pre-train the policy network on a set of dataflow graphs and use a superposition network to fine-tune it on each individual graph, achieving state-of-the-art performance on large hold-out graphs with over 50k nodes, such as an 8-layer GNMT.

This article describes a multivariate polynomial regression method where the uncertainty of the input parameters are approximated with Gaussian distributions, derived from the central limit theorem for large weighted sums, directly from the training sample. The estimated uncertainties can be propagated into the optimal fit function, as an alternative to the statistical bootstrap method. This uncertainty can be propagated further into a loss function like quantity, with which it is possible to calculate the expected loss function, and allows to select the optimal polynomial degree with statistical significance. Combined with simple phase space splitting methods, it is possible to model most features of the training data even with low degree polynomials or constants.

The Hodrick-Prescott (HP) filter is one of the most widely used econometric methods in applied macroeconomic research. The technique is nonparametric and seeks to decompose a time series into a trend and a cyclical component unaided by economic theory or prior trend specification. Like all nonparametric methods, the HP filter depends critically on a tuning parameter that controls the degree of smoothing. Yet in contrast to modern nonparametric methods and applied work with these procedures, empirical practice with the HP filter almost universally relies on standard settings for the tuning parameter that have been suggested largely by experimentation with macroeconomic data and heuristic reasoning about the form of economic cycles and trends. As recent research has shown, standard settings may not be adequate in removing trends, particularly stochastic trends, in economic data. This paper proposes an easy-to-implement practical procedure of iterating the HP smoother that is intended to make the filter a smarter smoothing device for trend estimation and trend elimination. We call this iterated HP technique the boosted HP filter in view of its connection to L2-boosting in machine learning. The paper develops limit theory to show that the boosted HP filter asymptotically recovers trend mechanisms that involve unit root processes, deterministic polynomial drifts, and polynomial drifts with structural breaks -- the most common trends that appear in macroeconomic data and current modeling methodology. A stopping criterion is used to automate the iterative HP algorithm, making it a data-determined method that is ready for modern data-rich environments in economic research. The methodology is illustrated using three real data examples that highlight the differences between simple HP filtering, the data-determined boosted filter, and an alternative autoregressive approach.

This paper summarizes the method used in our submission to Task 1 of the International Skin Imaging Collaboration's (ISIC) Skin Lesion Analysis Towards Melanoma Detection challenge held in 2018. We used a fully automated method to accurately segment lesion boundaries from dermoscopic images. A U-net deep learning network is trained on publicly available data from ISIC. We introduce the use of intensity, color, and texture enhancement operations as pre-processing steps and morphological operations and contour identification as post-processing steps.

This paper addresses the problem of multi-agent inverse reinforcement learning (MIRL) in a two-player general-sum stochastic game framework. Five variants of MIRL are considered: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total game value. In problem uCE-MIRL, agents are assumed to follow strategies that constitute a correlated equilibrium while maximizing total game value. Problem uNE-MIRL is similar to uCE-MIRL in total game value maximization, but it is assumed that the agents are playing a Nash equilibrium. Problems advE-MIRL and cooE-MIRL assume agents are playing an adversarial equilibrium and a coordination equilibrium, respectively. We propose novel approaches to address these five problems under the assumption that the game observer either knows or is able to accurate estimate the policies and solution concepts for players. For uCS-MIRL, we first develop a characteristic set of solutions ensuring that the observed bi-policy is a uCS and then apply a Bayesian inverse learning method. For uCE-MIRL, we develop a linear programming problem subject to constraints that define necessary and sufficient conditions for the observed policies to be correlated equilibria. The objective is to choose a solution that not only minimizes the total game value difference between the observed bi-policy and a local uCS, but also maximizes the scale of the solution. We apply a similar treatment to the problem of uNE-MIRL. The remaining two problems can be solved efficiently by taking advantage of solution uniqueness and setting up a convex optimization problem. Results are validated on various benchmark grid-world games.

Quantum-enhanced metrology aims to estimate an unknown parameter such that the precision scales better than the shot-noise bound. Single-shot adaptive quantum-enhanced metrology (AQEM) is a promising approach that uses feedback to tweak the quantum process according to previous measurement outcomes. Techniques and formalism for the adaptive case are quite different from the usual non-adaptive quantum metrology approach due to the causal relationship between measurements and outcomes. We construct a formal framework for AQEM by modeling the procedure as a decision-making process, and we derive the imprecision and the Cram\'{e}r-Rao lower bound with explicit dependence on the feedback policy. We also explain the reinforcement learning approach for generating quantum control policies, which is adopted due to the optimal policy being non-trivial to devise. Applying a learning algorithm based on differential evolution enables us to attain imprecision for adaptive interferometric phase estimation, which turns out to be SQL when non-entangled particles are used in the scheme.

Recently, there has been a growing interest in developing Computer Aided Diagnostic (CAD) systems for improving the reliability and consistency of pathology test results. This paper describes a novel CAD system for the Anti-Nuclear Antibody (ANA) test via Indirect Immunofluorescence protocol on Human Epithelial Type 2 (HEp-2) cells. While prior works have primarily focused on classifying cell images extracted from ANA specimen images, this work takes a further step by focussing on the specimen image classification problem itself. Our system is able to efficiently classify specimen images as well as producing meaningful descriptions of ANA pattern class which helps physicians to understand the differences between various ANA patterns. We achieve this goal by designing a specimen-level image descriptor that: (1) is highly discriminative; (2) has small descriptor length and (3) is semantically meaningful at the cell level. In our work, a specimen image descriptor is represented by its overall cell attribute descriptors. As such, we propose two max-margin based learning schemes to discover cell attributes whilst still maintaining the discrimination of the specimen image descriptor. Our learning schemes differ from the existing discriminative attribute learning approaches as they primarily focus on discovering image-level attributes. Comparative evaluations were undertaken to contrast the proposed approach to various state-of-the-art approaches on a novel HEp-2 cell dataset which was specifically proposed for the specimen-level classification. Finally, we showcase the ability of the proposed approach to provide textual descriptions to explain ANA patterns.

We show that for many classes of symmetric two-player games, the simple decision rule "imitate-the-best" can hardly be beaten by any other decision rule. We provide necessary and sufficient conditions for imitation to be unbeatable and show that it can only be beaten by much in games that are of the rock-scissors-paper variety. Thus, in many interesting examples, like 2x2 games, Cournot duopoly, price competition, rent seeking, public goods games, common pool resource games, minimum effort coordination games, arms race, search, bargaining, etc., imitation cannot be beaten by much even by a very clever opponent.

Deep convolutional neural networks trained for image object categorization have shown remarkable similarities with representations found across the primate ventral visual stream. Yet, artificial and biological networks still exhibit important differences. Here we investigate one such property: increasing invariance to identity-preserving image transformations found along the ventral stream. Despite theoretical evidence that invariance should emerge naturally from the optimization process, we present empirical evidence that the activations of convolutional neural networks trained for object categorization are not robust to identity-preserving image transformations commonly used in data augmentation. As a solution, we propose data augmentation invariance, an unsupervised learning objective which improves the robustness of the learned representations by promoting the similarity between the activations of augmented image samples. Our results show that this approach is a simple, yet effective and efficient (10 % increase in training time) way of increasing the invariance of the models while obtaining similar categorization performance.

We analyze general model selection procedures using penalized empirical loss minimization under computational constraints. While classical model selection approaches do not consider computational aspects of performing model selection, we argue that any practical model selection procedure must not only trade off estimation and approximation error, but also the computational effort required to compute empirical minimizers for different function classes. We provide a framework for analyzing such problems, and we give algorithms for model selection under a computational budget. These algorithms satisfy oracle inequalities that show that the risk of the selected model is not much worse than if we had devoted all of our omputational budget to the optimal function class.

The representation of independence relations generally builds upon the well-known semigraphoid axioms of independence. Recently, a representation has been proposed that captures a set of dominant statements of an independence relation from which any other statement can be generated by means of the axioms; the cardinality of this set is taken to indicate the complexity of the relation. Building upon the idea of dominance, we introduce the concept of stability to provide for a more compact representation of independence. We give an associated algorithm for establishing such a representation.We show that, with our concept of stability, many independence relations are found to be of lower complexity than with existing representations.

With the aid of the concept of stable independence we can construct, in an efficient way, a compact representation of a semi-graphoid independence relation. We show that this representation provides a new necessary condition for the existence of a directed perfect map for the relation. The test for this condition is based to a large extent on the transitivity property of a special form of d-separation. The complexity of the test is linear in the size of the representation. The test, moreover, brings the additional benefit that it can be used to guide the early stages of network construction.

This paper introduces a probability density estimator based on Green's function identities. A density model is constructed under the sole assumption that the probability density is differentiable. The method is implemented as a binary likelihood estimator for classification purposes, so issues such as mis-modeling and overtraining are also discussed. The identity behind the density estimator can be interpreted as a real-valued, non-scalar kernel method which is able to reconstruct differentiable density functions.

We present a novel approach to estimate the time delay between light curves of multiple images in a gravitationally lensed system, based on Kernel methods in the context of machine learning. We perform various experiments with artificially generated irregularly-sampled data sets to study the effect of the various levels of noise and the presence of gaps of various size in the monitoring data. We compare the performance of our method with various other popular methods of estimating the time delay and conclude, from experiments with artificial data, that our method is least vulnerable to missing data and irregular sampling, within reasonable bounds of Gaussian noise. Thereafter, we use our method to determine the time delays between the two images of quasar Q0957+561 from radio monitoring data at 4 cm and 6 cm, and conclude that if only the observations at epochs common to both wavelengths are used, the time delay gives consistent estimates, which can be combined to yield 408\pm 12 days. The full 6 cm dataset, which covers a longer monitoring period, yields a value which is 10% larger, but this can be attributed to differences in sampling and missing data.

Recurrent neural networks (RNNs) have been extraordinarily successful for prediction with sequential data. To tackle highly variable and noisy real-world data, we introduce Particle Filter Recurrent Neural Networks (PF-RNNs), a new RNN family that explicitly models uncertainty in its internal structure: while an RNN relies on a long, deterministic latent state vector, a PF-RNN maintains a latent state distribution, approximated as a set of particles. For effective learning, we provide a fully differentiable particle filter algorithm that updates the PF-RNN latent state distribution according to the Bayes rule. Experiments demonstrate that the proposed PF-RNNs outperform the corresponding standard gated RNNs on a synthetic robot localization dataset and 10 real-world sequence prediction datasets for text classification, stock price prediction, etc.

We consider parallel global optimization of derivative-free expensive-to-evaluate functions, and propose an efficient method based on stochastic approximation for implementing a conceptual Bayesian optimization algorithm proposed by Ginsbourger et al. (2007). To accomplish this, we use infinitessimal perturbation analysis (IPA) to construct a stochastic gradient estimator and show that this estimator is unbiased. We also show that the stochastic gradient ascent algorithm using the constructed gradient estimator converges to a stationary point of the q-EI surface, and therefore, as the number of multiple starts of the gradient ascent algorithm and the number of steps for each start grow large, the one-step Bayes optimal set of points is recovered. We show in numerical experiments that our method for maximizing the q-EI is faster than methods based on closed-form evaluation using high-dimensional integration, when considering many parallel function evaluations, and is comparable in speed when considering few. We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high quality solutions with fewer evaluations that a heuristic based on approximately maximizing the q-EI. A high quality open source implementation of this algorithm is available in the open source Metrics Optimization Engine (MOE).

Beneath the uncertain primitive visual features of face images are the primitive intrinsic structural patterns (PISP) essential for characterizing a sample face discriminative attributes. It is on this basis that this paper presents a simple yet effective facial descriptor formed from derivatives of Gaussian and Gabor Wavelets. The new descriptor is coined local edge gradient Gabor magnitude (LEGGM) pattern. LEGGM first uncovers the PISP locked in every pixel through determining the pixel gradient in relation to its neighbors using the Derivatives of Gaussians. Then, the resulting output is embedded into the global appearance of the face which are further processed using Gabor wavelets in order to express its frequency characteristics. Additionally, we adopted various subspace models for dimensionality reduction in order to ascertain the best fit model for reporting a more effective representation of the LEGGM patterns. The proposed descriptor-based face recognition method is evaluated on three databases: Plastic surgery, LFW, and GT face databases. Through experiments, using a base classifier, the efficacy of the proposed method is demonstrated, especially in the case of plastic surgery database. The heterogeneous database, which we created to typify real-world scenario, show that the proposed method is to an extent insensitive to image formation factors with impressive recognition performances.

We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates of stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based rates for non-smooth optimization. We give several applications of our results to statistical estimation problems, and provide experimental results that demonstrate the effectiveness of the proposed algorithms. We also describe how a combination of our algorithm with recent work on decentralized optimization yields a distributed stochastic optimization algorithm that is order-optimal.

Using an interactive theorem prover to reason about programs involves a sequence of interactions where the user challenges the theorem prover with conjectures. Invariably, many of the conjectures posed are in fact false, and users often spend considerable effort examining the theorem prover's output before realizing this. We present a synergistic integration of testing with theorem proving, implemented in the ACL2 Sedan (ACL2s), for automatically generating concrete counterexamples. Our method uses the full power of the theorem prover and associated libraries to simplify conjectures; this simplification can transform conjectures for which finding counterexamples is hard into conjectures where finding counterexamples is trivial. In fact, our approach even leads to better theorem proving, e.g. if testing shows that a generalization step leads to a false conjecture, we force the theorem prover to backtrack, allowing it to pursue more fruitful options that may yield a proof. The focus of the paper is on the engineering of a synergistic integration of testing with interactive theorem proving; this includes extending ACL2 with new functionality that we expect to be of general interest. We also discuss our experience in using ACL2s to teach freshman students how to reason about their programs.

Current algorithms for deep learning probably cannot run in the brain because they rely on weight transport, where forward-path neurons transmit their synaptic weights to a feedback path, in a way that is likely impossible biologically. An algorithm called feedback alignment achieves deep learning without weight transport by using random feedback weights, but it performs poorly on hard visual-recognition tasks. Here we describe two mechanisms - a neural circuit called a weight mirror and a modification of an algorithm proposed by Kolen and Pollack in 1994 - both of which let the feedback path learn appropriate synaptic weights quickly and accurately even in large networks, without weight transport or complex wiring.Tested on the ImageNet visual-recognition task, these mechanisms outperform both feedback alignment and the newer sign-symmetry method, and nearly match backprop, the standard algorithm of deep learning, which uses weight transport.