How to use ML-Agents Toolkit to create intelligent agents for your games


How to use ML-Agents Toolkit to create intelligent agents for your games


If you are interested in creating games that can learn from their own actions and adapt to different situations, you might want to check out ML-Agents Toolkit, a powerful AI tool for Unity editor. ML-Agents Toolkit is an open-source project that enables you to train and embed agents with reinforcement learning, imitation learning, neuroevolution and other machine learning methods.

In this post, I will show you how to install ML-Agents Toolkit, create a simple agent that can balance a ball on a platform, and train it using reinforcement learning. You will also learn how to use TensorBoard to visualize the training process and evaluate the agent’s performance.

What is ML-Agents Toolkit?

ML-Agents Toolkit is a tool that allows you to create agents that can learn from their own experiences and interact with complex environments. You can use ML-Agents Toolkit to:

  • Enhance your existing games with intelligent behaviors
  • Prototype new game ideas and mechanics
  • Try out several machine learning methods and algorithms
  • Generate data for testing and debugging purposes

ML-Agents Toolkit uses Unity as the simulation engine and Python as the interface for training and inference. You can use any Python-based machine learning framework such as TensorFlow or PyTorch to train your agents. You can also export your trained models as ONNX files and run them on any platform that supports Unity.

How to install ML-Agents Toolkit?

To install ML-Agents Toolkit, you need to have Unity 2018.4 or later and Python 3.6 or later installed on your computer. You also need to install some Python packages such as mlagents-envs, mlagents-learn, gym-unity and tensorboard.

You can follow these steps to install ML-Agents Toolkit:

  1. Clone or download the ML-Agents Toolkit repository from GitHub: https://github.com/Unity-Technologies/ml-agents
  2. Open a terminal window and navigate to the ml-agents folder
  3. Run pip3 install -e ./ml-agents-envs to install the mlagents-envs package
  4. Run pip3 install -e ./ml-agents to install the mlagents-learn package
  5. Run pip3 install mlagents[torch] if you want to use PyTorch as your machine learning framework (optional)
  6. Run pip3 install gym_unity if you want to use OpenAI Gym interface for your environments (optional)
  7. Run pip3 install tensorboard if you want to use TensorBoard for visualization (optional)

How to create a simple agent?

To create a simple agent that can balance a ball on a platform, you need two things: an environment and an agent script.

The environment is where the agent lives and interacts with its surroundings. The environment consists of objects such as platforms, balls, walls, etc., that have properties such as position, rotation, velocity, etc., that can be manipulated by scripts.

The agent script is where you define how the agent perceives its environment (observations), how it decides what actions to take (policy), how it executes those actions (actuator), and how it learns from its experiences (reward).

You can use one of the example environments provided by ML-Agents Toolkit or create your own custom environment using Unity editor.

For this tutorial, we will use the Ball Balance environment from the examples folder of ML-Agents Toolkit repository.

To open this environment in Unity editor:

  1. Launch Unity Hub and create a new project with 3D template
  2. In Project window, right-click on Assets folder and select Show in Explorer
  3. Copy the ProjectSettings folder from ml-agents/config/ppo/BalanceBall into your project’s Assets folder
  4. Copy the Scenes folder from ml-agents folder into your project’s Assets folder 5. In Project window, double-click on Scenes/BalanceBall to open it in Scene view

    You should see something like this:

    The Ball Balance environment consists of a platform that can tilt in two directions (x and z axes) and a ball that can roll on top of it. The goal of the agent is to keep the ball balanced on the platform as long as possible.

    The agent script is attached to the Platform object in the Hierarchy window. If you select it, you can see its components in the Inspector window.

    The agent script has three main components: Agent, Decision Requester and Behavior Parameters.

    The Agent component is where you define how the agent learns from its experiences. You can set various parameters such as Max Step (the maximum number of steps per episode), On Demand Decision (whether the agent requests a decision every step or only when needed), and Reward Signals (the types of rewards that are used for learning).

    The Decision Requester component is where you define how often the agent requests a decision from its policy. You can set parameters such as Decision Period (the number of steps between each decision request) and Take Actions Between Decisions (whether the agent repeats its last action between decision requests).

    The Behavior Parameters component is where you define how the agent perceives its environment and decides what actions to take. You can set parameters such as Behavior Name (the name of the behavior that corresponds to a trained model), Observation Shapes (the shapes and sizes of observation tensors), Action Spec (the type and dimension of action tensors), Model (the file path of a trained model), Inference Device (the device used for inference: CPU or GPU).

    For this tutorial, we will use the default settings for these components.

    How to train an agent using reinforcement learning?

    To train an agent using reinforcement learning, you need to define a reward function that specifies what kind of behavior you want to encourage or discourage. The reward function is usually implemented in two methods: CollectObservations() and OnActionReceived().

    The CollectObservations() method is where you collect information about the state of the environment and pass it to your policy as observations. The observations are usually numerical values that represent features such as position, velocity, angle, etc.

    The OnActionReceived() method is where you execute actions based on your policy’s output and assign rewards based on the outcome of your actions. The rewards are usually numerical values that represent how well or poorly the agent is performing its task.

    For example, in the Ball Balance environment, you can collect observations such as:

    • The position and velocity of the ball
    • The rotation and angular velocity of the platform
    • The distance between the ball and the center of the platform

    And you can assign rewards such as:

    • A positive reward for keeping the ball on the platform
    • A negative reward for letting the ball fall off the platform
    • A small penalty for tilting the platform too much

    These methods are already implemented in the BallBalanceAgent.cs script that is attached to the Platform object. You can modify them according to your own preferences and objectives.

    To train an agent using reinforcement learning, you need to use a machine learning algorithm that can learn from trial and error. ML-Agents Toolkit provides several algorithms that you can choose from, such as Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), Evolutionary Strategies (ES), etc.

    For this tutorial, we will use PPO, which is a popular algorithm for training agents in complex environments. PPO works by collecting data from multiple agents running in parallel environments, updating a neural network policy based on these data, and repeating this process until convergence.

    To use PPO with ML-Agents Toolkit, you need to create a configuration file that specifies various parameters for training such as learning rate, batch size, number of epochs, etc. You can find some example configuration files in ml-agents/config/ppo folder.

    For this tutorial, we will use BalanceBall.yaml file which contains parameters for training an agent on Ball Balance environment using PPO. You can open this file with any text editor and modify it according to your own preferences and objectives.

    To start training an agent using PPO with ML-Agents Toolkit, you need to run two commands: one for launching Unity editor with multiple environments running in parallel (called workers), and another for launching Python script that runs PPO algorithm and communicates with Unity editor via gRPC protocol.

    You can follow these steps to start training an agent using PPO with ML-Agents Toolkit:

    1. Open a terminal window and navigate to your project’s folder
    2. Run mlagents-learn config/ppo/BalanceBall.yaml --run-id=balanceball to launch Python script that runs PPO algorithm
    3. Run Unity.exe -projectPath . -batchmode -nographics -logFile - to launch Unity editor with 12 workers running Ball Balance environment

    You should see something like this:

    The Python script will print out some information about the training process such as episode length, cumulative reward, entropy, value loss, policy loss, etc. It will also save checkpoints of your trained model every 50000 steps in results/balanceball/models folder.

    You can stop training at any time by pressing Ctrl+C on your terminal window.

    How to use TensorBoard to visualize the training process?

    TensorBoard is a tool that allows you to visualize various metrics and graphs related to your machine learning experiments. You can use TensorBoard to monitor your agent’s performance during training and compare different runs or models.

    To use TensorBoard with ML-Agents Toolkit, you need to follow these steps:
    1. Open a terminal window and navigate to your project’s folder
    2. Run tensorboard --logdir results --port 6006 to launch TensorBoard server
    3. Open a browser window and navigate to localhost:6006

    You should see something like this:

    The TensorBoard dashboard will show you various tabs that contain different types of visualizations such as Scalars, Images, Graphs, Distributions, Histograms, etc.

    For example, you can use the Scalars tab to see how your agent’s cumulative reward changes over time during training.

    You can also use the Images tab to see how your agent’s observations look like at different steps.

    You can use the Graphs tab to see how your agent’s neural network policy is structured and what inputs and outputs it has.

    You can use the Distributions and Histograms tabs to see how your agent’s action values are distributed and how they change over time.

    You can also compare different runs or models by selecting them from the left panel and seeing their metrics side by side on the same plot.

    TensorBoard is a powerful tool that can help you understand and improve your agent’s performance during training. You can explore more features and options of TensorBoard by reading its documentation.

    How to test your trained model?

    After you finish training your agent using PPO with ML-Agents Toolkit, you can test your trained model by running it in inference mode. Inference mode is where you use your trained model to control your agent without any further learning or updating.

    To test your trained model using inference mode, you need to follow these steps:

    1. Open a terminal window and navigate to your project’s folder
    2. Run mlagents-learn config/ppo/BalanceBall.yaml --run-id=balanceball --resume to resume training from where you left off
    3. Run Unity.exe -projectPath . -batchmode -nographics -logFile - to launch Unity editor with one worker running Ball Balance environment

    You should see something like this:

    The Python script will print out some information about resuming training such as checkpoint path, step count, episode length, cumulative reward, etc.

    The Unity editor will show you how your agent behaves in Ball Balance environment using your trained model.

    You can stop testing at any time by pressing Ctrl+C on your terminal window.

    Conclusion

    In this tutorial, we have learned how to use ML-Agents Toolkit to create intelligent agents for our games using reinforcement learning. We have seen how to set up a simple environment (Ball Balance), how to define a reward function for our agent (BallBalanceAgent), how to train our agent using PPO algorithm (mlagents-learn), how to use TensorBoard to visualize the training process (tensorboard), and how to test our trained model using inference mode (mlagents-learn --resume).