Autonomous navigation of mobile robot in a dynamic environment using deep reinforcement learning
Loading...
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Autonomous navigation of a mobile robot in a dynamic environment is a highly challenging application because the path to the goal frequently changes due to unpredictable movements of humans with different velocities. Deep Reinforcement completely trains a model in a simulator using a trial-and-error technique by exploring and collecting required data automatically and cheaply from the customized environment. This research develops a customized environment with robot and pedestrian models in OpenAI Gym, replicating humans' real-world measurements and motion patterns. A mathematical model was developed to encapsulate the navigation norms of humans' to teach the robot about socially compliant routes using a reward function in order to smooth the robot’s navigation in a dynamic environment. A recently evolved algorithm, H-PPO, has been selected to train the model by considering the agent's hybrid action space, which consists of discrete actions parametrized by continuous values.
First, the model failed to learn due over fit in a simple environment, and then it learned when the task was randomized. The various approaches have been investigated to enhance the model's generalizability as much as possible in the simulator. Finally, the agent is trained in each environment separately. Despite this research has not considered the complex scenario as randomizing the whole environment initially, the developed model was able to scrutinize the performance of the recently evolved algorithm H-PPO in obstacle avoidance applications, and the developed model can learn obstacle avoidance in a dynamic environment by respecting social norms in the long-range motion for laboratory application. However, the success rate of the model trained later in a fully randomized 3-pedestrians environment was 86.67% out of 30 episodes of testing, which is higher than the previous research [1]. Further investigation has to be carried out in future work by adding memory ability to the model in order to enhance the performance, reduce the training time and mitigate the performance collapse during the learning phase.