Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun-Ching Liu

Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

May 08, 2016
Yun-Ching Liu, Yoshimasa Tsuruoka

Figure 1 for Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

Figure 2 for Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

Figure 3 for Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

Figure 4 for Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

The combination of multi-armed bandit (MAB) algorithms with Monte-Carlo tree search (MCTS) has made a significant impact in various research fields. The UCT algorithm, which combines the UCB bandit algorithm with MCTS, is a good example of the success of this combination. The recent breakthrough made by AlphaGo, which incorporates convolutional neural networks with bandit algorithms in MCTS, also highlights the necessity of bandit algorithms in MCTS. However, despite the various investigations carried out on MCTS, nearly all of them still follow the paradigm of treating every node as an independent instance of the MAB problem, and applying the same bandit algorithm and heuristics on every node. As a result, this paradigm may leave some properties of the game tree unexploited. In this work, we propose that max nodes and min nodes have different concerns regarding their value estimation, and different bandit algorithms should be applied accordingly. We develop the Asymmetric-MCTS algorithm, which is an MCTS variant that applies a simple regret algorithm on max nodes, and the UCB algorithm on min nodes. We will demonstrate the performance of the Asymmetric-MCTS algorithm on the game of $9\times 9$ Go, $9\times 9$ NoGo, and Othello.

* submitted to the 2016 IEEE Computational Intelligence and Games Conference

Via

Access Paper or Ask Questions

Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search

May 11, 2015
Yun-Ching Liu, Yoshimasa Tsuruoka

Figure 1 for Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search

Figure 2 for Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search

The UCT algorithm, which combines the UCB algorithm and Monte-Carlo Tree Search (MCTS), is currently the most widely used variant of MCTS. Recently, a number of investigations into applying other bandit algorithms to MCTS have produced interesting results. In this research, we will investigate the possibility of combining the improved UCB algorithm, proposed by Auer et al. (2010), with MCTS. However, various characteristics and properties of the improved UCB algorithm may not be ideal for a direct application to MCTS. Therefore, some modifications were made to the improved UCB algorithm, making it more suitable for the task of game tree search. The Mi-UCT algorithm is the application of the modified UCB algorithm applied to trees. The performance of Mi-UCT is demonstrated on the games of $9\times 9$ Go and $9\times 9$ NoGo, and has shown to outperform the plain UCT algorithm when only a small number of playouts are given, and rougly on the same level when more playouts are available.

* To appear in the 14th International Conference on Advances in Computer Games (ACG 2015)

Via

Access Paper or Ask Questions