Term of Award

Fall 2019

Degree Name

Master of Science, Mechanical Engineering

Document Type and Release Option

Thesis (restricted to Georgia Southern)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


Department of Mechanical Engineering

Committee Chair

Samanta Biswanath

Committee Member 1

Choi JungHun

Committee Member 2

Kim Jinki


A study is presented on visual navigation of wheeled mobile robots (WMR) using deep reinforcement learning in unknown and dynamic environments. Two versions of deep reinforcement learning (DRL) algorithms, namely, value-learning based deep Q-network (DQN) and policy gradient based asynchronous advantage actor critic (A3C) have been considered in this study. Both DRL algorithms have been implemented using RGB and depth images as inputs to generate outputs for the WMR for autonomous navigation in both simulation and real-time. The initial DRL networks were generated and trained progressively in simulation environments using OpenAI Gym Gazebo within robot operating system (ROS) framework for a popular experimental WMR, namely, Kobuki TurtleBot2 with Asus Xtion depth camera. The real-time implementation of the trained DRL networks in ROS framework was achieved using onboard edge computing hardware platform of NVIDIA Jetson TX2 through software framework of TensorFlow. For object detection, classification, and target identification, a pre-trained deep neural network, namely, ResNet50 was used after further training with reduced classification categories for target-driven visual mapless navigation of Turlebot2 through DRL. The simulation based training of DQN and A3C networks was successfully transferred with online learning in real-time navigation of Turlebot2 in physical environments. The performance of A3C was simulated with multiple computation threads (4, 6, and 8) on a desktop. The simulated navigation performance, in terms of the minimum, the average, and the maximum rewards, and the completion tine was compared for DQN and A3C networks for three simulation environments. The performance of A3C with multiple threads (4, 6, and 8) was better than DQN, as expected. The performance of A3C also improved with the number of threads. The real-time implementation results of A3C with 8 threads in unknown and dynamic environments with target objects were promising. Details of the methodology, simulation and real-time implementation results are presented and recommendations for future work are outlined.

OCLC Number


Research Data and Supplementary Material