DESIGN OF A DEEP REINFORCEMENT LEARNING BASED OPTIMAL PH CONTROLLER FOR NITRIFICATION BIOREACTORS IN AQUAPONICS SYSTEMS Pin Chathushka Parami De Silva (168658R) Dissertation submitted in partial fulfillment of the requirements for the degree Master of Science in Industrial Automation Department of Electrical Engineering University of Moratuwa Sri Lanka May 2019 i DECLARATION I declare that this is my own work and this dissertation does not incorporate without acknowledgement any material previously submitted for a Degree or Diploma in any other University or institute of higher learning and to the best of my knowledge and belief it does not contain any material previously published or written by another person except where the acknowledgement is made in the text. Also, I hereby grant to University of Moratuwa the non-exclusive right to reproduce and distribute my dissertation, in whole or in part in print, electronic or other medium. I retain the right to use this content in whole or part in future works (such as articles or books). Signature: Date: The above candidate has carried out research for the Masters dissertation under my supervision. Name of the supervisor: Dr. A.G.B.P Jayesekera Signature of the supervisor: Date: ii DEDICATION To all the people who feed us…. iii ACKNOWLEDGEMENTS This research would have been another dream without the support from the following people. First of all, I wish to express my sincere thanks to my supervisor, Dr Buddhika Jayasekara, for guiding me in the research process and providing feedback and advice over the years. Next, I wish to thank Dr. D.P. Chamdima and Prof. K.T.M.U. Hemapala for their valuable feedback. Their feedback was extremely valuable in asking the right research questions and steering the study in the correct direction. I wish to thank Eng. B.S. Samarasiri for mentoring me through the years and introducing me to research and product development. I would also like to give my sincere thanks to Mr. Jayasiri Kumarasinghe for all those industrial outings and for the valuable and practical industrial experience. He has taken me all over the country to see various industries and introduced me to industrial instrumentation and automation. This exposure was significant in finding the currently studied research problem. Finally, I would like to thank my wife, my mother, my father and my brother for all the love and support they have given me and persevering with me during the entire period of the research. iv ABSTRACT Recent advances in deep reinforcement learning has produced state of the art algorithms. These algorithms have better training stability, convergence and computational performance. In this study a state of the art deep reinforcement learning algorithm is used to implement a self-learning, model free, non-linear controller to control pH of an aquaponic system. Aquaponics is a soil-less farming system where effluent water from a fish tank is used as nutrients for growing plants. Maintaining the pH of an aquaponic system provides the optimal condition for micro-organisms that convert the ammonia rich fish effluent to nitrates, which are easily absorbed by the plants. In order to optimize this conversion process known as nitrification, pH is maintained at optimal conditions within an intermediate setup known as the nitrification bioreactor. The implementation of a deep reinforcement learning based controller is studied in detail and the performance of the deep reinforcement learning based pH controller is evaluated by comparing the performance of a classic PID based controller in an aquaponic system. The results show that DRL based controllers are better suited for control of dynamic stochastic control pH process and is capable of learning complex plant models and tuning itself based on the learnt model. The outcomes of this research can be applied in the design of optimal controllers that learns purely from experience to optimize various industrial processes. This type of controllers is ideal in Industry 4.0 based applications. Keywords: Deep Reinforcement Learning, Artificial Intelligence, Aquaponics, Nitrification, Process Control v TABLE OF CONTENT Declaration i Dedication ii Acknowledgements iii Abstract iv Table of Content v List of Figures viii List of Tables x List of Abbreviations xi List of Appendices xii Chapter 1 Introduction 1 1.1 Objectives 3 1.2 Thesis Outline 5 1.3 Limitations of the Study 6 Chapter 2 Literature Review 7 2.1 Introduction 7 2.2 Related works in Aquaponics and Limitations 7 2.2.1 Biomass Balance Equation 9 2.2.2 Substrate Balance Equation 10 2.3 Related works in pH control 11 2.3.1 PID based pH controllers 12 2.3.2 Fuzzy logic based pH control 13 2.3.3 Adaptive Neuro Fuzzy Inference Systems 16 2.3.4 pH controllers based on optimal control 17 2.4 Deep Reinforcement Learning Techniques 20 2.4.1 Dynamic Programming in the Context of Reinforcement Learning 20 2.4.2 Summary of Dynamic Programming Methods 24 2.4.3 Monte Carlo Learning 25 2.4.4 Temporal Difference Learning 27 2.4.5 Policy Gradient Methods 28 2.4.6 Actor Critic Methods 29 vi 2.4.7 Curse of Dimensionality 30 2.4.8 The Deathly Triads 30 2.4.9 Activation Functions for neural networks 31 2.5 Summary 33 Chapter 3 Methodology and Controller Design 35 3.1 Introduction 35 3.2 Methodology 35 3.3 Development of the Deep Reinforcement Learning based controller 36 3.3.1 Specification of Inputs and Outputs to the controller 37 3.3.2 Design of Critic 38 3.3.3 Design of Policy Network 39 3.3.4 Determination of Learning Rate 40 3.3.5 Learning and Gradient Descent based update 40 3.3.6 Selection of deep reinforcement algorithm 41 3.3.7 Designing the Reward Function 42 3.3.8 Overall Architecture of the DRL controller 43 3.3.9 Results based on empirical work 43 3.3.10 Tool chains and Development tools 47 3.4 Development of the PID controller 48 3.5 Hardware Design 49 Chapter 4 Experimental Results and Analysis 51 4.1 Introduction 51 4.2 Evaluating controller performance under a static deterministic system 51 4.2.1 Hardware Setup 52 4.2.2 Experimental procedure 52 4.2.3 Results 53 4.2.4 Analysis 54 4.3 Evaluating controller performance under a dynamic stochastic system 57 4.3.1 Hardware Setup 58 4.3.2 Experimental procedure 59 4.3.3 Results 60 4.3.4 Analysis 61 Chapter 5 Conclusions 62 vii 5.1 Conclusion on Objectives 62 5.2 Conclusion on Research Questions 63 5.3 Further works 64 5.3.1 Internet of Things use case 64 5.3.2 DRL controllers in SCADA systems 65 REFERENCES 66 Appendix A: Modelling and Controlling Techniques for Aquaponic Systems 70 Appendix B: DRL Controller Implementation 76 Appendix C: Device Driver For Hardware Interfacing 83 Appendix D: Digital PID Controller Implementation 85 Appendix E: List of Algorithms 87 viii LIST OF FIGURES Figure 1.1 Operation of an Aquaponics System 2 Figure 1.2 Relationship between the research problem, research questions and discipline 4 Figure 2.1. Ammonia ionization capability based on pH 8 Figure 2.2. System Boundary used in Mass Balance Equation 9 Figure 2.3 Block diagram of a typical PID controller used in process automation 12 Figure 2.4 Fuzzy controller overview (top left), output membership function (top right) & input membership function of a fuzzy based pH controller designed using simulink 15 Figure 2.5 ANFIS architecture 16 Figure 2.6 Graphical representation of a Markovian Decision Process 18 Figure 2.7 Graphical representation of a Partially Observable Markovian Decision Process 19 Figure 2.8 Agent environment interaction in a reinforcement learning problem 20 Figure 2.9 Pictorial representation of policy evaluation & improvement (right) and value iteration (left) 23 Figure 2.10 Comparison of activation functions 31 Figure 3.1 Relationship between the input and output of the DRL controller 37 Figure 3.2 Recurrent Neural Network that approximates the critic 38 Figure 3.3 Neural network that approximates the actor/policy network 39 Figure 3.4 Overall system architecture of DRL controller and its peripherals 43 Figure 3.5 UML diagram of the DRL controller implementing the A3C algorithm 44 Figure 3.6 Visual Representation of the implemented DRL controller using Tensorboard 45 Figure 3.7 Internal networks of the DRL controller represented using Tensorboard 45 Figure 3.8 Training losses of the actor network and critic network plotted at each training steps. 46 ix Figure 3.9 Total moving reward generated at two different epochs. 47 Figure 3.10 Simulink model of a nitrification bioreactor in an aquaponics system 48 Figure 3.11 Hardware Interfacing 50 Figure 4.1 Setup to study the performance in a static system 51 Figure 4.2 Setup to study the performance in a static system 52 Figure 4.3 Transient responses of the controllers in the static system 53 Figure 4.4 Setup used to study the performance in a dynamic system 57 Figure 4.5 The aquaponics system used to determine the response of the DRL controller dynamic stochastic conditions 58 Figure 4.6 Steady state response of the aquaponics system. This setup is a stochastic system and the pH should be maintained at a set point of 7.2 for extended durations. 60 Figure 5.1 Implementation of the DRL controller in IoT/Industry 4.0 based applications 64 Figure 5.2 Implementation of the DRL controller in a SCADA scenario 65 x LIST OF TABLES Table 1: Summary of Dynamic Programming Methods ............................................ 25 Table 2: Summary of literature review and identified research gaps......................... 33 Table 3: Comparison of different DRL algorithms .................................................... 41 Table 4: Comparison of different software framework for implementing DRL controller ...................................................................................................... 47 Table 5: Gain values obtained from Simulink model ................................................ 49 Table 6: Rise times of DRL & PID controller in static system .................................. 49 Table 7: Rise times of DRL & PID controller in static system .................................. 54 Table 8: Rise times of DRL & PID controller in static system .................................. 55 Table 9: Results of ANOVA test on the static case results ........................................ 56 Table 10: Comparison of steady state value of the two controllers in the static case 56 Table 11: Comparison of steady state value of the two controllers in the dynamic case ............................................................................................................ 61 xi LIST OF ABBREVIATIONS Abbreviation Description AI Artificial Intelligence RL Reinforcement Learning DRL Deep Reinforcement Learning MDP Markovian Decision Process POMDP Partially Observable Markovian Decision Process SISO Single Input Single Output MIMO Multiple Input Multiple Output DP Dynamic Programming ADP Asynchronous Dynamic Programming GPI Generalized Policy Iteration RPi Raspberry Pi I2C Inter- Integrated Circuit Protocol ReLu Rectified Linear Unit IoT Internet of Things ANOVA Analysis of Variance xii LIST OF APPENDICES Appendix Description Page Appendix A Modeling and Controlling Techniques for Aquaponic Systems 73 Appendix B DRL controller Implementation 79 Appendix C Device Driver for Hardware Interfacing 86 Appendix D Digital PID Controller Implementation 88 Appendix E List of DRL Algorithms 90