Racing simulation control using proximal policy optimization (PPO) and spiking neural networks (SNNs).

Quick Youtube Video

The youtube video (1 min) showcases the results of the project, including:

SNN vs ANN comparison
Network visualization
Evaluation on robotics (Gazebo/ROS)

https://www.youtube.com/watch?v=a3TkSxlVKyY

Quickstart

Installation

Clone and install the project requirements with the following:

git clone https://github.com/NotEnoughSnow/kart-simulator.git
cd kart-simulator
pip install -r requirements.txt

Running

main.py connects various modules which can be ran by specifying the desired mode.

Maunally Testing the Simulation:

A good way to test if the base software works is by changing the args.mode to "play" in main.py at the very bottom which is done with the following :

args.mode = "play"

Then in main() set the environment type and map with:

env_name = steer_gazebo

track_type = "loader"
track_name = "big_S"

This allows you to use the keyboard keys : W, A, S, D to control the agent in the steer-mouvement environment in the large map.

To train a Model:

Change args.mode to "train"
Choose an environment type (grid_env, steer_env)
Set the saving directory with save_config
Set the training settings in "KartSimulator/runners/train". Including saving options, seed, number of timesteps, network type, and hyperparameters.
launch

To visualize a Training Session:

Change args.mode to "replay"
Keep the environment type (grid_env, steer_env) the same as the run.
Make sure ""project_name" in main.py is set to the training session's project name.
Set the replay file in "KartSimulator/runners/replay" under run_name (e.g.,run_name = "sample-ANN-2")
Select the replay mode (batch, all) to view the training by batches or to visualize all of the training at once.
launch

List of avaiable simulation types, modes, and tracks are available below.

About the Project

This project was submitted as part of a thesis research project during my Masters degree in Artificial Intelligence at ELTE university, Budapest. The earliest version of the simulation started as a course project during the fall semester of 2023.

The primary objective of this research is to investigate the viability of Spiking Neural Networks (SNNs) within the context of Reinforcement Learning (RL). Specifically, it seeks to evaluate whether SNNs can perform effectively in RL settings and to explore their potential advantages over traditional Artificial Neural Networks (ANNs). Conducting this study in a simulated environment provides the necessary control for experimentation and analysis while also addressing challenges associated with real-world applications.

The project aims to:

Compare SNNs to ANNs in the context of RL and robotics in order to explore their advantages and limitations.
Explore various SOTA methodologies in relation to SNNs and computational neuroscience within the outlined context.
Explore the sim2real gap by deploying trained models into robotics.
Explore the effeciency and challenges of neuromorphic hardware.

Project Structure

core : Contains modules for PPO training and evaluation
runners : Different modules used in main.py (e.g., training, evaluating)
sim : A directory collecting the different simulations and their tools

Simulations and Environments

Gymnasium Environments

The project houses 3 different top-down pygame/gymnasium racing environments with varying difficulties (ascending):

Grid-Env : an environment with 4-directional mouvement. The agent can accelerate freely in all directions.
Gazebo-Env : an environment with sequential-like mouvement. The agent can either control its linear acceleration or its rotational acceleration, but not at the same time. This environment was specifically made to match the ROS/GAZEBO setup, and targets a 2-wheel with differential drive robotic architecture.
Steer-Env : an environment with car-like accelerating and steering mouvement. This version is more challenging to train but offers more complexity and authenticity to the driving/racing control problem.

Environment Features

Ray-casting 2d vision
A pymunk physics implementation to set up the player and track dynamics
UI for tracking important information
Gym env structure; step, reset, render methods
Flexible methods for observations, actions, ect..

Observations:

The agents mainly use LIDAR scans to navigate the environment. They also receive kinematic information.

Example observation : [LIDAR scans, position, angle, velocity].

Reward Function (adjustable):

Passive penalty
Out of bounds penalty
Stand still or desert penalty
Finish track reward
Action-distance reward/penalty
Steer reward/penalty (depends on env type)

Proximal Policy Optimization:

Spiking Network:

The spiking network builds on the ANN by incorporating spiking neurons and various methods to process inputs and outputs from/to spike trains (the data format used by the spiking network, instead of float vectors/tensors). A spike train is an array of binary events encoding over time: each timestep in the spike train equals to 1 if the neuron spiked, and 0 otherwise.

The input float vector from the environment is processed into spike trains by a rate encoding function:

$$P(R_{ij}=1)=X_{ij}=1-P(R_{ij}=0)$$

The network contains two full connected layers of leaky-LIF spiking neurons with the following internal mechanism:

Decay: the neuron (specifically, its membrane) loses charge overtime exponentially.
Integrate: the neuron accumulates charge over time when exposed to input stimulus (current).
Fire: once past a certain threshold, the neuron fires, with the possibility of resetting and/or inhabiting further input for a short time window. The spiking neurons then incorporate the following discretized equation:

$$U[t+1] = \underbrace{\beta U[t]}_\text{decay} + \underbrace{WX[t+1]}_\text{input} - \underbrace{S[t]U_{\rm thr}}_\text{reset}$$

Where $\beta$ is a the membrane potential decay rate.

Since spikes are non-differentiable, and in order to use backprop, fast-sigmoid surrogate gradients were employed:

$$\begin{split}S&≈\frac{U}{1 + k|U|} \\\ \frac{∂S}{∂U}&=\frac{1}{(1+k|U|)^2}\end{split}$$

Where k represents the steepness of the function and defaults to 25.

Finally, the spikes are decoded using a linear readout layer by averaging the output spike trains and passing them through a linear learnable layer:

$$Avg\_Spikes_{i} = \frac{1}{N}\sum^{N}_{t=1}Spike\_Train_{i,t}$$

$$Output = W_{readout}\times Avg\_Spikes+b_{readout}$$

Results:

SteerEnv ANN vs SNN visual comparison:

Performance of SNN policies compared to ANNs across all environments:

Environment-Model	First Finish	300 Finishes	% of ANN
LatEnv-ANN LatEnv-SNN	141k 175k	402k 406k	- 99.0%
GazeboEnv-ANN GazeboEnv-SNN	82k 22k	431k 507k	- 85.0%
SteerEnv-ANN SteerEnv-SNN	156k 441k	724k 957k	- 75.7%

Cumulative Successful Episodes of SNN policies across all environments:

Where:

X-axis: How many train-steps a policy was trained on (e.g., 300k).
Y-axis: How many times in total the policy managed to reach the final goal.
Each color is the best trained Spiking policy for each environment.
The graph describes how many times the policy finishes the track throughout a policy training session.

Configurations

Available modes:

play: Manually test the simulation.
train: Train a NN model with PPO either given the optimized starting weights or from scratch.
eval: Test and evaluate the performance of the NN.
replay: Visualize a training session.

Available maps:

big_S
small_S

Available environment types:

grid_env
gazebo_env
steer_env

Future development

Collect and compare data from different methodologies
Migrate Simulation to Box2D instead of Pymunk
Improve the pipeline, while using better fitting tools and environments such as IsaacGym, which has GPU support. This will cut the SNNs train time, which is significant.
Improve the study by testing on real life robots.
Include neuromorphic hardware for evaluation. Also allows us to measure energy consumption metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
.idea		.idea
kartSimulator		kartSimulator
media		media
notebooks		notebooks
optuna		optuna
track_creator		track_creator
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Racing simulation control using proximal policy optimization (PPO) and spiking neural networks (SNNs).

Quick Youtube Video

Quickstart

Installation

Running

Maunally Testing the Simulation:

To train a Model:

To visualize a Training Session:

About the Project

Project Structure

Simulations and Environments

Gymnasium Environments

Environment Features

Proximal Policy Optimization:

Spiking Network:

Results:

Configurations

Future development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Racing simulation control using proximal policy optimization (PPO) and spiking neural networks (SNNs).

Quick Youtube Video

Quickstart

Installation

Running

Maunally Testing the Simulation:

To train a Model:

To visualize a Training Session:

About the Project

Project Structure

Simulations and Environments

Gymnasium Environments

Environment Features

Proximal Policy Optimization:

Spiking Network:

Results:

Configurations

Future development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages