DESIGN OF A DEEP REINFORCEMENT LEARNING 
BASED OPTIMAL PH CONTROLLER FOR 
NITRIFICATION BIOREACTORS IN AQUAPONICS 
SYSTEMS 
 
 
Pin Chathushka Parami De Silva 
 
(168658R) 
 
Dissertation submitted in partial fulfillment of the requirements for the degree Master 
of Science in Industrial Automation 
 
  
Department of Electrical Engineering 
 
University of Moratuwa 
Sri Lanka 
 
May 2019
i 
 
DECLARATION 
 
I declare that this is my own work and this dissertation does not incorporate without 
acknowledgement any material previously submitted for a Degree or Diploma in 
any other University or institute of higher learning and to the best of my knowledge 
and belief it does not contain any material previously published or written by 
another person except where the acknowledgement is made in the text. 
 
Also, I hereby grant to University of Moratuwa the non-exclusive right to 
reproduce and distribute my dissertation, in whole or in part in print, electronic or 
other medium. I retain the right to use this content in whole or part in future works 
(such as articles or books).  
 
 
Signature:        Date: 
 
 
The above candidate has carried out research for the Masters dissertation under my 
supervision.  
 
 
Name of the supervisor: Dr. A.G.B.P Jayesekera    
 
 
Signature of the supervisor:      Date: 
 
 
ii 
 
  
 
 
 
 
DEDICATION 
 
 
 
 
 
 
To all the people who feed us…. 
 
 
 
 
 
 
 
 
 
 
 
 
iii 
 
ACKNOWLEDGEMENTS 
 
This research would have been another dream without the support from the following 
people. 
 
First of all, I wish to express my sincere thanks to my supervisor, Dr Buddhika 
Jayasekara, for guiding me in the research process and providing feedback and advice 
over the years.  
Next, I wish to thank Dr. D.P. Chamdima and Prof. K.T.M.U. Hemapala for their 
valuable feedback. Their feedback was extremely valuable in asking the right research 
questions and steering the study in the correct direction. 
I wish to thank Eng. B.S. Samarasiri for mentoring me through the years and 
introducing me to research and product development. I would also like to give my 
sincere thanks to Mr. Jayasiri Kumarasinghe for all those industrial outings and for the 
valuable and practical industrial experience. He has taken me all over the country to 
see various industries and introduced me to industrial instrumentation and automation. 
This exposure was significant in finding the currently studied research problem.  
Finally, I would like to thank my wife, my mother, my father and my brother for all 
the love and support they have given me and persevering with me during the entire 
period of the research. 
 
iv 
 
 
ABSTRACT 
 
Recent advances in deep reinforcement learning has produced state of the art 
algorithms. These algorithms have better training stability, convergence and 
computational performance.  
In this study a state of the art deep reinforcement learning algorithm is used to 
implement a self-learning, model free, non-linear controller to control pH of an 
aquaponic system.  Aquaponics is a soil-less farming system where effluent water from 
a fish tank is used as nutrients for growing plants. Maintaining the pH of an aquaponic 
system provides the optimal condition for micro-organisms that convert the ammonia 
rich fish effluent to nitrates, which are easily absorbed by the plants. In order to 
optimize this conversion process known as nitrification, pH is maintained at optimal 
conditions within an intermediate setup known as the nitrification bioreactor.  
The implementation of a deep reinforcement learning based controller is studied in 
detail and the performance of the deep reinforcement learning based pH controller is 
evaluated by comparing the performance of a classic PID based controller in an 
aquaponic system.  
The results show that DRL based controllers are better suited for control of dynamic 
stochastic control pH process and is capable of learning complex plant models and 
tuning itself based on the learnt model. The outcomes of this research can be applied 
in the design of optimal controllers that learns purely from experience to optimize 
various industrial processes. This type of controllers is ideal in Industry 4.0 based 
applications. 
 
Keywords: Deep Reinforcement Learning, Artificial Intelligence, Aquaponics, 
Nitrification, Process Control 
 
 
v 
 
TABLE OF CONTENT 
 
Declaration i 
Dedication ii 
Acknowledgements iii 
Abstract iv 
Table of Content v 
List of Figures viii 
List of Tables x 
List of Abbreviations xi 
List of Appendices xii 
Chapter 1 Introduction 1 
1.1 Objectives 3 
1.2 Thesis Outline 5 
1.3 Limitations of the Study 6 
Chapter 2 Literature Review 7 
2.1 Introduction 7 
2.2 Related works in Aquaponics and Limitations 7 
2.2.1 Biomass Balance Equation 9 
2.2.2 Substrate Balance Equation 10 
2.3 Related works in pH control 11 
2.3.1 PID based pH controllers 12 
2.3.2 Fuzzy logic based pH control 13 
2.3.3 Adaptive Neuro Fuzzy Inference Systems 16 
2.3.4 pH controllers based on optimal control 17 
2.4 Deep Reinforcement Learning Techniques 20 
2.4.1 Dynamic Programming in the Context of Reinforcement Learning 20 
2.4.2 Summary of Dynamic Programming Methods 24 
2.4.3 Monte Carlo Learning 25 
2.4.4 Temporal Difference Learning 27 
2.4.5 Policy Gradient Methods 28 
2.4.6 Actor Critic Methods 29 
vi 
 
2.4.7 Curse of Dimensionality 30 
2.4.8 The Deathly Triads 30 
2.4.9 Activation Functions for neural networks 31 
2.5 Summary 33 
Chapter 3 Methodology and Controller Design 35 
3.1 Introduction 35 
3.2 Methodology 35 
3.3 Development of the Deep Reinforcement Learning based controller 36 
3.3.1 Specification of Inputs and Outputs to the controller 37 
3.3.2 Design of Critic 38 
3.3.3 Design of Policy Network 39 
3.3.4 Determination of Learning Rate 40 
3.3.5 Learning and Gradient Descent based update 40 
3.3.6 Selection of deep reinforcement algorithm 41 
3.3.7 Designing the Reward Function 42 
3.3.8 Overall Architecture of the DRL controller 43 
3.3.9 Results based on empirical work 43 
3.3.10 Tool chains and Development tools 47 
3.4 Development of the PID controller 48 
3.5 Hardware Design 49 
Chapter 4 Experimental Results and Analysis 51 
4.1 Introduction 51 
4.2 Evaluating controller performance under a static deterministic system 51 
4.2.1 Hardware Setup 52 
4.2.2 Experimental procedure 52 
4.2.3 Results 53 
4.2.4 Analysis 54 
4.3 Evaluating controller performance under a dynamic stochastic system 57 
4.3.1 Hardware Setup 58 
4.3.2 Experimental procedure 59 
4.3.3 Results 60 
4.3.4 Analysis 61 
Chapter 5 Conclusions 62 
vii 
 
5.1 Conclusion on Objectives 62 
5.2 Conclusion on Research Questions 63 
5.3 Further works 64 
5.3.1 Internet of Things use case 64 
5.3.2 DRL controllers in SCADA systems 65 
REFERENCES 66 
Appendix A: Modelling and Controlling Techniques for Aquaponic Systems 70 
Appendix B: DRL Controller Implementation 76 
Appendix C: Device Driver For Hardware Interfacing 83 
Appendix D: Digital PID Controller Implementation 85 
Appendix E: List of Algorithms 87 
 
viii 
 
LIST OF FIGURES 
 
 
Figure 1.1 Operation of an Aquaponics System 2 
Figure 1.2 Relationship between the research problem, research questions and 
discipline 4 
Figure 2.1. Ammonia ionization capability based on pH 8 
Figure 2.2. System Boundary used in Mass Balance Equation 9 
Figure 2.3 Block diagram of a typical PID controller used in process automation 12 
Figure 2.4 Fuzzy controller overview (top left), output membership function (top 
right) & input membership function of a fuzzy based pH controller 
designed using simulink 15 
Figure 2.5 ANFIS architecture 16 
Figure 2.6 Graphical representation of a Markovian Decision Process 18 
Figure 2.7 Graphical representation of a Partially Observable Markovian Decision 
Process 19 
Figure 2.8 Agent environment interaction in a reinforcement learning problem 20 
Figure 2.9 Pictorial representation of policy evaluation & improvement (right) and 
value iteration (left) 23 
Figure 2.10 Comparison of activation functions 31 
Figure 3.1 Relationship between the input and output of the DRL controller 37 
Figure 3.2 Recurrent Neural Network that approximates the critic 38 
Figure 3.3 Neural network that approximates the actor/policy network 39 
Figure 3.4 Overall system architecture of DRL controller and its peripherals 43 
Figure 3.5 UML diagram of the DRL controller implementing the A3C algorithm 44 
Figure 3.6 Visual Representation of the implemented DRL controller using 
Tensorboard 45 
Figure 3.7 Internal networks of the DRL controller represented using Tensorboard 45 
Figure 3.8 Training losses of the actor network and critic network plotted at each 
training steps. 46 
ix 
 
Figure 3.9 Total moving reward generated at two different epochs. 47 
Figure 3.10 Simulink model of a nitrification bioreactor in an aquaponics system 48 
Figure 3.11 Hardware Interfacing 50 
Figure 4.1 Setup to study the performance in a static system 51 
Figure 4.2 Setup to study the performance in a static system 52 
Figure 4.3 Transient responses of the controllers in the static system 53 
Figure 4.4 Setup used to study the performance in a dynamic system 57 
Figure 4.5 The aquaponics system used to determine the response of the DRL 
controller dynamic stochastic conditions 58 
Figure 4.6 Steady state response of the aquaponics system. This setup is a stochastic 
system and the pH should be maintained at a set point of 7.2 for extended 
durations. 60 
Figure 5.1 Implementation of the DRL controller in IoT/Industry 4.0 based 
applications 64 
Figure 5.2 Implementation of the DRL controller in a SCADA scenario 65 
x 
 
LIST OF TABLES 
 
Table 1: Summary of Dynamic Programming Methods ............................................ 25 
Table 2: Summary of literature review and identified research gaps......................... 33 
Table 3: Comparison of different DRL algorithms .................................................... 41 
Table 4: Comparison of different software framework for implementing DRL 
controller ...................................................................................................... 47 
Table 5: Gain values obtained from Simulink model ................................................ 49 
Table 6: Rise times of DRL & PID controller in static system .................................. 49 
Table 7: Rise times of DRL & PID controller in static system .................................. 54 
Table 8: Rise times of DRL & PID controller in static system .................................. 55 
Table 9: Results of ANOVA test on the static case results ........................................ 56 
Table 10: Comparison of steady state value of the two controllers in the static case 56 
Table 11: Comparison of steady state value of the two controllers in the dynamic 
case ............................................................................................................ 61 
 
 
xi 
 
LIST OF ABBREVIATIONS 
 
 
Abbreviation Description 
AI Artificial Intelligence 
RL Reinforcement Learning 
DRL Deep Reinforcement Learning 
MDP Markovian Decision Process 
POMDP Partially Observable Markovian Decision Process 
SISO Single Input Single Output 
MIMO Multiple Input Multiple Output 
DP Dynamic Programming 
ADP Asynchronous Dynamic Programming 
GPI Generalized Policy Iteration 
RPi Raspberry Pi 
I2C Inter- Integrated Circuit Protocol 
ReLu Rectified Linear Unit 
IoT Internet of Things 
ANOVA Analysis of Variance 
 
  
 
 
xii 
 
LIST OF APPENDICES  
 
 
Appendix Description Page 
Appendix A Modeling and Controlling Techniques for Aquaponic 
Systems 
73 
Appendix B DRL controller Implementation 79 
Appendix C Device Driver for Hardware Interfacing 86 
Appendix D Digital PID Controller Implementation 88 
Appendix E List of DRL Algorithms 90