Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas: Paper and Code

Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

Sep 26, 2018
Nicolas Anastassacos, Mirco Musolesi

Figure 1 for Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

Figure 2 for Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

Figure 3 for Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

Figure 4 for Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

Share this with someone who'll enjoy it:

Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter eta to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma (IPD), LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the IPD.

* 9 pages, 4 figures

View paper on

Share this with someone who'll enjoy it: