Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peter R. Wurman

Sony AI

Composing Efficient, Robust Tests for Policy Selection

Jun 12, 2023
Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Figure 1 for Composing Efficient, Robust Tests for Policy Selection

Figure 2 for Composing Efficient, Robust Tests for Policy Selection

Figure 3 for Composing Efficient, Robust Tests for Policy Selection

Figure 4 for Composing Efficient, Robust Tests for Policy Selection

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.

* 26 pages, 13 figures. To appear in Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI 2023)

Via

Access Paper or Ask Questions

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Jun 24, 2022
James MacGlashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone

Figure 1 for Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Figure 2 for Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Figure 3 for Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Figure 4 for Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward influence metric, which measures each reward component's effect on agent decision-making. Using these tools, we provide several demonstrations of decomposition's use in identifying and addressing problems in the design of both environments and agents. Value decomposition is broadly applicable and easy to incorporate into existing algorithms and workflows, making it a powerful tool in an RL practitioner's toolbox.

* 9 content pages, 12 Appendix pages, 19 figures

Via

Access Paper or Ask Questions

Analysis and Observations from the First Amazon Picking Challenge

Sep 22, 2017
Nikolaus Correll, Kostas E. Bekris, Dmitry Berenson, Oliver Brock, Albert Causo, Kris Hauser, Kei Okada, Alberto Rodriguez, Joseph M. Romano, Peter R. Wurman

Figure 1 for Analysis and Observations from the First Amazon Picking Challenge

Figure 2 for Analysis and Observations from the First Amazon Picking Challenge

Figure 3 for Analysis and Observations from the First Amazon Picking Challenge

Figure 4 for Analysis and Observations from the First Amazon Picking Challenge

This paper presents a overview of the inaugural Amazon Picking Challenge along with a summary of a survey conducted among the 26 participating teams. The challenge goal was to design an autonomous robot to pick items from a warehouse shelf. This task is currently performed by human workers, and there is hope that robots can someday help increase efficiency and throughput while lowering cost. We report on a 28-question survey posed to the teams to learn about each team's background, mechanism design, perception apparatus, planning and control approach. We identify trends in this data, correlate it with each team's success in the competition, and discuss observations and lessons learned based on survey results and the authors' personal experiences during the challenge.

Via

Access Paper or Ask Questions

Optimal Factory Scheduling using Stochastic Dominance A*

Feb 13, 2013
Peter R. Wurman, Michael P. Wellman

Figure 1 for Optimal Factory Scheduling using Stochastic Dominance A*

Figure 2 for Optimal Factory Scheduling using Stochastic Dominance A*

Figure 3 for Optimal Factory Scheduling using Stochastic Dominance A*

Figure 4 for Optimal Factory Scheduling using Stochastic Dominance A*

We examine a standard factory scheduling problem with stochastic processing and setup times, minimizing the expectation of the weighted number of tardy jobs. Because the costs of operators in the schedule are stochastic and sequence dependent, standard dynamic programming algorithms such as A* may fail to find the optimal schedule. The SDA* (Stochastic Dominance A*) algorithm remedies this difficulty by relaxing the pruning condition. We present an improved state-space search formulation for these problems and discuss the conditions under which stochastic scheduling problems can be solved optimally using SDA*. In empirical testing on randomly generated problems, we found that in 70%, the expected cost of the optimal stochastic solution is lower than that of the solution derived using a deterministic approximation, with comparable search effort.

* Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996)

Via

Access Paper or Ask Questions