A reinforcement learning powered autonomous drone capable of navigating GPS-denied forest environments using onboard camera vision, PPO, and embedded AI compute.
The Background
This project began as our Computer Engineering thesis at Lund University. Together with my thesis partner, we explored one of the most challenging problems in robotics and autonomous systems: enabling drones to navigate forests without GPS.
Traditional drones depend heavily on GPS signals for localisation and navigation. In dense forests, those signals become unreliable or disappear entirely due to heavy canopy coverage. Existing solutions often rely on expensive LiDAR systems, pre-mapped routes, or manual piloting.
We wanted to explore whether reinforcement learning and computer vision alone could solve this problem using only onboard camera input and embedded compute.
The Goal
Our objective was to build a fully autonomous drone capable of navigating complex forest environments without GPS, external infrastructure, or pre-mapped paths.
The drone needed to perceive the environment through a monocular RGB camera, make real-time navigation decisions onboard, avoid collisions, and generalise to forests it had never seen before.
The Stack
The system was built around Proximal Policy Optimization (PPO), a reinforcement learning algorithm designed for stable continuous control tasks. We trained our agents using Stable-Baselines3 and PyTorch.
A Convolutional Neural Network (CNN) processed visual camera input and extracted spatial features used by the PPO policy network to generate navigation commands in real time.
The drone platform used PX4 autopilot integration together with Gazebo simulation for safe and scalable training. For onboard deployment, inference was executed on embedded Jetson hardware mounted directly on the drone.
The Challenge
One of the hardest problems was sim-to-real transfer. Reinforcement learning agents often perform extremely well in simulation but fail in real-world environments because reality contains lighting variation, noise, wind, motion blur, and imperfect physics.
Our first agents completely ignored camera vision and exploited weaknesses in the reward system instead. Instead of learning navigation, the drone learned to hover within safe height bounds.
To solve this, we redesigned the reward structure, introduced curriculum learning, applied domain randomization, and progressively increased environment complexity during training.
Architecture
The system follows a modular architecture designed around reliability and real-time performance.
A monocular camera continuously captures visual input, which is processed by a CNN feature extractor. The extracted representation is passed into the PPO policy network, which predicts movement commands such as forward velocity, turning direction, and altitude adjustment.
These commands are then translated into flight control instructions through PX4 autopilot integration. The full pipeline runs onboard the drone using Jetson embedded compute with inference latency below 100 milliseconds.
Features
👾 Autonomous forest navigation without GPS.
👾 Reinforcement learning with PPO and Stable-Baselines3.
👾 CNN-based vision processing from monocular RGB camera input.
👾 Real-time onboard inference using embedded Jetson hardware.
👾 Curriculum learning and domain randomization for sim-to-real transfer.
👾 Gazebo simulation environment integrated with PX4 autopilot.
👾 Real-world forest validation flights with autonomous obstacle avoidance.
Results
The final system achieved an 89% navigation success rate in simulation and maintained approximately 72% success in real-world forest environments without additional fine-tuning.
The drone successfully navigated routes exceeding 400 meters, adapted to unseen lighting conditions, and maintained stable real-time decision making at 10Hz.
Most importantly, the project validated that low-cost, vision-based reinforcement learning systems can realistically solve autonomous navigation problems traditionally dominated by expensive sensor-heavy approaches.
What I Learned
This project taught me how reinforcement learning behaves in real engineering systems outside controlled academic benchmarks. It pushed my understanding of embedded systems, robotics, computer vision, simulation pipelines, and AI deployment on constrained hardware.
It also reinforced the importance of systems thinking. Building autonomous robotics is not just about machine learning models — it is about architecture, reliability, safety layers, hardware constraints, and designing systems that continue working when reality becomes unpredictable.