TinyRL: Can AI Learn to Swing Up a Real Pendulum? | DigiKey
Reinforcement learning (RL) is a form of machine learning that involves training agents to interact with an environment in order to maximize cumulative rewards. In this video, we teach an AI to swing up a pendulum using real hardware and RL.
A write-up of the project can be found here: https://www.digikey.com/en/maker/projects/tiny-reinforcement-learning-tinyrl-for-robotics/4ccd2c84fe1d42e68da16a069a12a48f
An RL agent learns to interact with its environment using trial and error. Shawn creates an interface in Arduino that can control a stepper motor and read the position of an encoder attached to the pendulum. The goal is to train an agent to learn to swing up the pendulum on its own.
Intro to Reinforcement Learning video: https://www.youtube.com/watch?v=3av8vozEczU
Hyperparameter Optimization video:
To accomplish this, the Arduino is connected to a computer running the Farama gymnasium and Stable Baselines3 frameworks. These frameworks take in the observations, have the agent guess an action, and tell the Arduino what action to take. The agent is updated using the proximal policy optimization (PPO) algorithm found in Stable Baselines3.
Initially, Shawn tried to perform a full swing-up and balance with a continuous action set. However, this proved too difficult for the agent, as the round trip time to and from the Arduino along with model updates took too long to successfully balance the pole. To reduce the scope, the action set was made into a discrete set (+10 deg, 0 deg, -10 deg), and the episode ended when the pendulum reached the top under a particular speed. If the pendulum moved too fast near the top, it was considered to have “crashed,” and a penalty was applied.
Once the agent successfully learned how to perform the swing-up, it was deployed to the Arduino. To perform the deployment, the critic portion of the actor-critic model in the PPO agent was stripped away, and the remaining actor model (3-layer dense neural network) was optimized using Edge Impulse. The model was then deployed to an ESP32S3 to perform the swing-up without any input from the computer.
Product Links: STEVAL-EDUKIT01 - https://www.digikey.com/en/products/detail/stmicroelectronics/STEVAL-EDUKIT01/11696333 Seeed Studio XIAO ESP32S3 - https://www.digikey.com/en/products/detail/seeed-technology-co-ltd/113991114/19285530
Related Videos: https://www.youtube.com/watch?v=3av8vozEczU https://www.youtube.com/watch?v=wwPhkF_2I0w
Related Project Links: https://www.digikey.com/en/maker/projects/intro-to-reinforcement-learning-using-gymnasium-and-stable-baselines3/28c6602f5d1e4ce1b5a90642a1ac7efc
https://www.digikey.com/en/maker/projects/teach-an-ai-to-play-qwop/ce7e360e67ae4017809be3576385ae5e
Learn more: Maker.io - https://www.digikey.com/en/maker DigiKey’s Blog – TheCircuit https://www.digikey.com/en/blog Connect with DigiKey on Facebook https://www.facebook.com/digikey.electronics/ And follow us on X(formaly Twitter) https://twitter.com/digikey
00:00 - Introduction 01:10 - Hardware overview 03:00 - Modifying the pendulum tower 04:20 - Arduino communication interface 04:49 - Overview of reinforcement learning 06:17 - Reward function 08:32 - Agent actor-critic deep neural network 09:33 - Hyperparameter optimization overview 09:51 - Agent training with Python 14:57 - Troubleshooting an agent that does not learn 16:46 - Reduce scope to just swing up and use discrete action space 18:03 - Train simpler agent 18:22 - Deploy agent to ESP32 19:56 - Test agent on the pendulum 20:46 - Conclusion and further areas of research

