Another promising area making significant strides is multi-agent reinforcement learning. Contrary to the problems we've seen where only one agent makes decisions, this topic involves having multiple agents make decisions simultaneously and cooperatively in order to achieve a common objective. One of the most significant works related to this has been OpenAI's Dota2-playing system, called OpenAI Five. Dota2 is one of the world's most popular Massively Multiplayer Online Role Playing Game (MMORPGs). Compared to traditional RL games such as Go and Atari, Dota2 is more complex for the following reasons:
- Multiple agents: Dota2 games involve two teams of five players, each fighting to destroy the other team's base. Hence there are multiple agents, not just one, making decisions simultaneously.
- Observability: The screen only shows the proximity of the agent's character instead of the whole map. This means that the whole game state, including the locations of opponents and what they are doing, is not observable. In reinforcement learning, we call this a partially-observable state.
- High dimensionality: A Dota2 agent's observations can include 20,000 points, each depicting what a human player may observe on the screen, including health, the location of the controlling character, the location of enemies, and any attacks. Go, on the other hand, requires fewer data points to construct an observation (19 x 19 board, past moves). Hence, observations have high dimensionality and complexity. This also goes for decisions, where a Dota2 AI's action space consists of 170,000 possibilities, which includes decisions on movement, casting spells, and using items.
Moreover, by using novel upgrades on traditional reinforcement learning algorithms, each agent in OpenAI Five was able to learn to cooperate with one another in order to reach the common objective of destroying the enemy's base. They were even able to learn several team strategies that experienced human players employ. The following is a screenshot from a game being played between a team of Dota players and OpenAI Five:
Despite the extreme levels of resource requirements (240 GPUs, 120,000 CPU cores, ~200 human years of gameplay in a single day), this project demonstrates that current AI algorithms are indeed able to cooperate with one another to reach a common objective in a vastly complex environment. This work symbolizes another significant advancement in AI and RL research and demonstrates what the current technology is capable of.