Reinforcement Learning in Pokemon Red: Teaching AI to Master Classic Games

Written By: Aniqa Batool
Last Updated On: October 19, 2023

Reinforcement learning was used to train the AI to play Pokemon Red, a process that started with the AI having no understanding of the game and only the ability to activate random buttons. The AI learned to capture Pokemon, train them, and even beat a gym leader over the course of five years of actual gaming time. A reward mechanism led this learning process, motivating the AI to achieve tasks and learn through experimentation and error.

Many applications, productivity hacks, automations, processes, and other approaches to leverage AI to improve their outcomes, skills, and more have been discussed. One use they haven’t discussed before involves using AI to play Pokemon Red. This guide will teach us more about training AI models with reinforcement learning. In addition to the AI’s growth process in the game, the AI’s achievements and mistakes, the technical specifics of the AI’s development, tactics for performing tests effectively, and future enhancements, as well as how to run the program on your own computer if interested.

The AI was pushed to traverse the video game map and look for additional screens, with awards granted for doing so. This learning process, however, didn’t come without its difficulties. Because of the novelty reward system, the AI became obsessed with certain places at times. The incentive system was altered to modify the AI’s behavior, such as raising the threshold for novelty rewards to stimulate the exploration of new regions. Additional prizes were provided to encourage the AI to fight and level its Pokemon.

The AI learned to travel the game world, fight conflicts, and even take advantage of the game’s random number generator. The AI’s behavior was analyzed and visualized in order to better comprehend its training and decision-making processes.

The AI was developed using proximal policy optimization, a type of reinforcement learning algorithm. This algorithm was picked for its ability to manage Pokemon Red’s intricate and ever-changing environment. However, training the AI was not without difficulties. Going back in the game, the cost of performing the training, and thoughtfully designing the aspect of reward were all aspects to consider.

Training the AI on an individual machine is possible, with instructions supplied in the project’s GitHub repository. It is crucial to remember, however, that the game will automatically end after 32K steps or around one hour. This can be raised by changing the ep_length setting, however, doing so would consume more RAM. By default, this can consume up to 100G of RAM. This can be reduced by lowering the num_cpu or ep_length, but it may have an effect on the results. Furthermore, the model behavior may become degenerate for the first 50 iterations of training or so before improving.