Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

May 09, 2019

Max Simchowitz, Kevin Jamieson

This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate between the gap dependent logarithmic-regret, and the $\widetilde{\mathcal{O}}(\sqrt{HSAT})$-minimax rate. The key technique in our analysis is a novel "clipped" regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.
May 09, 2019

Max Simchowitz, Kevin Jamieson

**Click to Read Paper and Get Code**

A Bandit Approach to Multiple Testing with False Discovery Control

Sep 06, 2018

Kevin Jamieson, Lalit Jain

We propose an adaptive sampling approach for multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. We consider $n$ distributions whose means are partitioned by whether they are below or equal to a baseline (nulls), versus above the baseline (actual positives). In addition, each distribution can be sequentially and repeatedly sampled. Inspired by the multi-armed bandit literature, we provide an algorithm that takes as few samples as possible to exceed a target true positive proportion (i.e. proportion of actual positives discovered) while giving anytime control of the false discovery proportion (nulls predicted as actual positives). Our sample complexity results match known information theoretic lower bounds and through simulations we show a substantial performance improvement over uniform sampling and an adaptive elimination style algorithm. Given the simplicity of the approach, and its sample efficiency, the method has promise for wide adoption in the biological sciences, clinical testing for drug discovery, and online A/B/n testing problems.
Sep 06, 2018

Kevin Jamieson, Lalit Jain

**Click to Read Paper and Get Code**

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Feb 27, 2015

Kevin Jamieson, Ameet Talwalkar

Motivated by the task of hyperparameter optimization, we introduce the non-stochastic best-arm identification problem. Within the multi-armed bandit literature, the cumulative regret objective enjoys algorithms and analyses for both the non-stochastic and stochastic settings while to the best of our knowledge, the best-arm identification framework has only been considered in the stochastic setting. We introduce the non-stochastic setting under this framework, identify a known algorithm that is well-suited for this setting, and analyze its behavior. Next, by leveraging the iterative nature of standard machine learning algorithms, we cast hyperparameter optimization as an instance of non-stochastic best-arm identification, and empirically evaluate our proposed algorithm on this task. Our empirical results show that, by allocating more resources to promising hyperparameter settings, we typically achieve comparable test accuracies an order of magnitude faster than baseline methods.
Feb 27, 2015

Kevin Jamieson, Ameet Talwalkar

**Click to Read Paper and Get Code**

The True Sample Complexity of Identifying Good Arms

Jun 15, 2019

Julian Katz-Samuels, Kevin Jamieson

Jun 15, 2019

Julian Katz-Samuels, Kevin Jamieson

**Click to Read Paper and Get Code**

Pure-Exploration for Infinite-Armed Bandits with General Arm Reservoirs

Nov 15, 2018

Maryam Aziz, Kevin Jamieson, Javed Aslam

Nov 15, 2018

Maryam Aziz, Kevin Jamieson, Javed Aslam

**Click to Read Paper and Get Code**

Open Loop Hyperparameter Optimization and Determinantal Point Processes

Feb 14, 2018

Jesse Dodge, Kevin Jamieson, Noah A. Smith

Feb 14, 2018

Jesse Dodge, Kevin Jamieson, Noah A. Smith

**Click to Read Paper and Get Code**

The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime

Feb 16, 2017

Max Simchowitz, Kevin Jamieson, Benjamin Recht

Feb 16, 2017

Max Simchowitz, Kevin Jamieson, Benjamin Recht

**Click to Read Paper and Get Code**

Finite Sample Prediction and Recovery Bounds for Ordinal Embedding

Jun 22, 2016

Lalit Jain, Kevin Jamieson, Robert Nowak

Jun 22, 2016

Lalit Jain, Kevin Jamieson, Robert Nowak

**Click to Read Paper and Get Code**

On the Detection of Mixture Distributions with applications to the Most Biased Coin Problem

Mar 25, 2016

Kevin Jamieson, Daniel Haas, Ben Recht

Mar 25, 2016

Kevin Jamieson, Daniel Haas, Ben Recht

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

* 17 pages, an extended version of our NIPS 2011 paper. The new version revises the argument of the robust section and slightly modifies the result there to give it more impact

**Click to Read Paper and Get Code**

Sequential Experimental Design for Transductive Linear Bandits

Jun 20, 2019

Tanner Fiez, Lalit Jain, Kevin Jamieson, Lillian Ratliff

In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.
Jun 20, 2019

Tanner Fiez, Lalit Jain, Kevin Jamieson, Lillian Ratliff

**Click to Read Paper and Get Code**

Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning

Mar 12, 2019

Liam Li, Evan Sparks, Kevin Jamieson, Ameet Talwalkar

Mar 12, 2019

Liam Li, Evan Sparks, Kevin Jamieson, Ameet Talwalkar

**Click to Read Paper and Get Code**

**Click to Read Paper and Get Code**

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Dec 27, 2013

Kevin Jamieson, Matthew Malloy, Robert Nowak, Sébastien Bubeck

Dec 27, 2013

Kevin Jamieson, Matthew Malloy, Robert Nowak, Sébastien Bubeck

**Click to Read Paper and Get Code**

On Finding the Largest Mean Among Many

Jun 17, 2013

Kevin Jamieson, Matthew Malloy, Robert Nowak, Sebastien Bubeck

Jun 17, 2013

Kevin Jamieson, Matthew Malloy, Robert Nowak, Sebastien Bubeck

**Click to Read Paper and Get Code**

Query Complexity of Derivative-Free Optimization

Sep 11, 2012

Kevin G. Jamieson, Robert D. Nowak, Benjamin Recht

This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.
Sep 11, 2012

Kevin G. Jamieson, Robert D. Nowak, Benjamin Recht

**Click to Read Paper and Get Code**

A framework for Multi-A(rmed)/B(andit) testing with online FDR control

Nov 18, 2017

Fanny Yang, Aaditya Ramdas, Kevin Jamieson, Martin J. Wainwright

Nov 18, 2017

Fanny Yang, Aaditya Ramdas, Kevin Jamieson, Martin J. Wainwright

* Published as a conference paper at NIPS 2017

**Click to Read Paper and Get Code**

Adaptive Sampling for Convex Regression

Aug 26, 2018

Max Simchowitz, Kevin Jamieson, Jordan W. Suchow, Thomas L. Griffiths

Aug 26, 2018

Max Simchowitz, Kevin Jamieson, Jordan W. Suchow, Thomas L. Griffiths

**Click to Read Paper and Get Code**

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Jun 18, 2018

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar

Jun 18, 2018

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar

* Journal of Machine Learning Research 18 (2018) 1-52

* Changes: - Updated to JMLR version

**Click to Read Paper and Get Code**