A robot designed for reinforcement learning and control experiments with real hardware.
- train.py: training script for the robot.
- hardware/: 3D-printable models for the structure.
- firmware/: Arduino code for the ESP32 microcontroller.
- test_robot.py: testing script for the robot.
- env.py: Gymnasium environment definition.
- wrappers.py: environment wrappers.
| Item | Description |
|---|---|
| ESP32 (Dev Kit C) | Microcontroller. Several board sizes are available on the market. We use the 25.70 × 53.40 mm version, but you may need to adapt the design for your board. |
| L9110H + propeller kit | Motor, propeller, and controller. |
| AS5600 | Rotary encoder. |
| 3D-printed structure | Printed support structure. |
| 2 × M3 locking nuts | Fasteners. |
| 2 × M3×25 screws | Fasteners. |
| 623z | Bearing. |
| Flexible 4-wire cable | Electrical connection. |
| Wooden or cardboard base | Base, approximately 120 × 100 mm. |
| Counterweight | It has to still fall when left alone, but helps the actuator lift the pendulum. A M6 bolt, washer and nut was used in our case. |
- Connect the VCC pins of the L9110H and AS5600 to the 3.3 V pin on the ESP32.
- Connect all GND pins together, and w
- Connect the signal pins to the appropriate GPIOs on the ESP32 (as specified in firmware.ino).
- Glue the encoder magnet to the end of the screw that acts as the shaft.
- Some boards may require a small amount of glue to remain securely in place.
Upload the firmware to the ESP32 using the Arduino IDE.
-
Install the required Python packages:
pip install -r requirements.txt
-
Verify that everything is working by running:
python test_robot.py
The robot should move in a somewhat random manner.
-
Start training with:
python train.py
A history wrapper is used to maintain the last observations and actions taken by the agent.
This provides the agent with short-term memory and context, effectively restoring the Markov property of the environment. This is necessary because certain state variables (such as
If we assume the propeller force is first order, thus directly controllable (not entirely realistic), the system can be represented as:
Dynamics:
Empirical measurements of F(u)
| u [V] | F [N] | I [A] |
|---|---|---|
| 8.4 | 0.111 | 0.340 |
| 7.3 | 0.087 | — |
| 6.0 | 0.066 | 0.222 |
| 5.0 | 0.046 | 0.170 |
| 4.0 | 0.031 | 0.120 |
| 2.8 | 0.016 | — |
| 0.0 | 0.000 | 0.000 |

