Abstract:
This thesis addresses the issue of modeling the agent navigation in a benign environment by using reinforcement learning. With the use of reinforcement learning the learning agent is created who does not have any idea about the environment that the agent is placed initially. The agent will be exploring the environment and with the use of the rewards and penalties the agent perceives from the environment the agent will be learning how to act to maximize the rewards that agent is getting from the environment. To implement the learning process Temporal Difference learning algorithm Q-learning and Sarsa is used. The system uses the user defined source and goal as the input to the system and output the optimal navigation path from the given source to the goal by learning the environment itself. The learning agent is update their knowledge and will find a policy value for the reward function and the value function. The main objective of this thesis is to give an approach in which the agent learning happens without any human intervention The most important part of this learning agent is at the time of initiation agent is not having any idea about the environment but will learn eventually on how to navigate in a benign environment without having any initial knowledge of the environment. The output of the system is the optimal path generated by the learning agent form the given source to the goal. To evaluate the system the results of the agent behavior is checked against the human knowledge mapping on navigation.