Skip to content

Asimple robot designed for reinforcement learning on real hardware

Notifications You must be signed in to change notification settings

Gabo-Tor/pendulum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Actuated Pendulum with Propeller

A robot designed for reinforcement learning and control experiments with real hardware.

Video

Repository Map

  • train.py: training script for the robot.
  • hardware/: 3D-printable models for the structure.
  • firmware/: Arduino code for the ESP32 microcontroller.
  • test_robot.py: testing script for the robot.
  • env.py: Gymnasium environment definition.
  • wrappers.py: environment wrappers.

Getting Started

Hardware

Bill of Materials

Item Description
ESP32 (Dev Kit C) Microcontroller. Several board sizes are available on the market. We use the 25.70 × 53.40 mm version, but you may need to adapt the design for your board.
L9110H + propeller kit Motor, propeller, and controller.
AS5600 Rotary encoder.
3D-printed structure Printed support structure.
2 × M3 locking nuts Fasteners.
2 × M3×25 screws Fasteners.
623z Bearing.
Flexible 4-wire cable Electrical connection.
Wooden or cardboard base Base, approximately 120 × 100 mm.
Counterweight It has to still fall when left alone, but helps the actuator lift the pendulum. A M6 bolt, washer and nut was used in our case.

Assembly

  • Connect the VCC pins of the L9110H and AS5600 to the 3.3 V pin on the ESP32.
  • Connect all GND pins together, and w
  • Connect the signal pins to the appropriate GPIOs on the ESP32 (as specified in firmware.ino).
  • Glue the encoder magnet to the end of the screw that acts as the shaft.
  • Some boards may require a small amount of glue to remain securely in place.

Assembled Robot

Firmware

Upload the firmware to the ESP32 using the Arduino IDE.

Usage

  1. Install the required Python packages:

    pip install -r requirements.txt
  2. Verify that everything is working by running:

    python test_robot.py

    The robot should move in a somewhat random manner.

  3. Start training with:

    python train.py

Reinforcement Learning

History Wrapper

A history wrapper is used to maintain the last observations and actions taken by the agent. This provides the agent with short-term memory and context, effectively restoring the Markov property of the environment. This is necessary because certain state variables (such as $\dot{\theta}$ or the propeller speed) are not directly observable from a single timestep.

State-Space Representation

If we assume the propeller force is first order, thus directly controllable (not entirely realistic), the system can be represented as:

$$ \mathbf{x} = \begin{bmatrix}\theta\ \dot\theta\end{bmatrix}). $$

Dynamics:

$$ \dot{\mathbf{x}} = \begin{bmatrix} \dot\theta \\ \displaystyle \frac{1}{J}\Big( l \omega_p = F(u) - m g l \sin\theta - b\dot\theta \Big) \end{bmatrix} $$

Empirical measurements of F(u)

u [V] F [N] I [A]
8.4 0.111 0.340
7.3 0.087
6.0 0.066 0.222
5.0 0.046 0.170
4.0 0.031 0.120
2.8 0.016
0.0 0.000 0.000

About

Asimple robot designed for reinforcement learning on real hardware

Resources

Stars

Watchers

Forks

Languages