Master of Science in Computer science and Engineering

Permanent URI for this collectionhttp://192.248.9.226/handle/123/50

Browse

Recent Submissions

Now showing 1 - 20 of 250
  • item: Thesis-Abstract
    Microservice-based architecture for transport network planning
    (2023) Chathumadusha, S; Perera, I
    In today's digital era, industries across the board are transitioning to digital platforms to optimize resources, reduce costs, enhance customer value, and improve productivity and efficiency. This research focuses on software solutions tailored to the telecom industry's needs in adapting to the digital world. Within the communication industry, data transmission stands as a critical area that wants to increase attention. Customer satisfaction relies heavily on the bandwidth capacity that can be efficiently carried through the network. Consequently, engineers must concentrate on this area and upgrade it based on demand, considering the limitations posed by high equipment and resource costs. Traditional manual approaches to telecom operations are inefficient, time-consuming, and expensive. For instance, expanding fiber connections to new locations requires physical surveys to determine distances, which may not always result in the shortest path. The Dijkstra algorithm offers a solution to this technological challenge, allowing for more optimized routing. This research delves into how transmission planning for wireless and wired can be simplified through the application of such algorithms and calculations. To develop the proposed microservice-based model, the research leverages the latest technologies and incorporates user-friendly dashboards. The chosen implementation model is a hybrid one combining Model-View-Controller (MVC) and microservices (MS). The hybrid model provides flexibility as specific modules can be reused for other tasks or applications when necessary. By implementing microservices, each problem is addressed individually, enabling a more modular approach. Additionally, the integration capabilities incorporated into the application facilitate the onboarding of third-party systems and automate the planning process. This research serves as a valuable resource for the telecom industry, offering insights into software solutions tailored to the digital world. The incorporation of advanced algorithms and a hybrid implementation model empowers engineers to overcome technological challenges efficiently, optimize resource utilization, and streamline transmission planning processes. Ultimately, the findings presented in this research contribute to the ongoing digital transformation within the telecom industry
  • item: Thesis-Abstract
    Text-to-SQL generation using schema item classifier and encoder-decoder architecture
    (2023) Rushdy, MSA; Uthayasanker, T
    The objective of the text-to-SQL task is to convert natural language queries into SQL queries. However, the presence of extensive text-to-SQL datasets across multiple domains, such as Spider, introduces the challenge of effectively generalizing to unseen data. Existing semantic parsing models have struggled to achieve notable performance improvements on these cross-domain datasets. As a result, recent advancements have focused on leveraging pre-trained language models to address this issue and enhance performance in text-to-SQL tasks. These approaches represent the latest and most promising attempts to tackle the challenges associated with generalization and performance improvement in this field. I proposed an approach to evaluate and use the Seq2Seq model by giving the most relevant schema items as the input to the encoder and to generate accurate and valid cross-domain SQL queries using the decoder by understanding the skeleton of the target SQL query. The proposed approach is evaluated using Spider dataset which is a well-known dataset for text-to-sql task and able to get promising results where the Exact Match accuracy and Execution accuracy has been boosted to 72.7% and 80.2% respectively compared to other best related approaches. Keywords: Text-to-SQL, Seq2Seq model, BERT, RoBERTa, T5-Base
  • item: Thesis-Abstract
    Performance improvements in MLOPS pipeline
    (2023) Kasthururaajan, R; Perera, I
    In the modern world, most of the enterprises willing to leverage the use of machine learning models in their applications. Due to the high demand usage of the machine learning models in production, need to bring the machine learning models from research to production with minimal time duration, MLOps emerge an unavoidable practice. Big scope of the MLOps opens many doors for research. MLOps is one of the emerging topic among researchers. There are many people involved in the entire machine learning life cycle with various roles. Similar to DevOps, MLOps is also a culture that should be practiced by all the parties with different roles who involved in the entire process to get a better outcome. MLOps adopts many practices from DevOps and it has some own set of practices as well. Even though there are are many tools and technologies developed to build MLOps pipeline, there are still rooms for further studies to improve the performance of the MLOps pipelines. There are many phases in the entire machine learning process such as data handling, model training, model evaluation, hyperparameter tuning, model deployment, model versioning, and model monitoring etc. For a successful performance of an MLOps pipeline, all of these phases should be automated as much as possible. Performance improvements in the MLOps pipeline can be achieved in terms of easiness of usage, time and cost. In this study we have taken a simple machine learning problem called "Stock price prediction for Google stock prices using LSTM". We have analysed many tools that can be used in MLOps pipeline. Finally we have implemented an end-to-end MLOps pipeline with open source tools and technologies for the selected machine learning problem. Our final solution is implemented using DVC, MLflow, Evidently and GitHub Actions. We compared our final solution along with other solutions available in the market and analysed the pros and cons. Our solution is very flexible to use. It has no vendor locking. If any modifications or extensions of tools needed, it can be plugged easily into the proposed architecture. We have automated almost all the phases in the MLOps pipeline. It reduce the time taken to bring the machine learning models from research to production. Since we have used free and open source tools mostly, it is very cost effective. We have found that our final solution improves the performance of the MLOps pipeline in terms of easiness of usage, time and cost. Keywords: MLOps, Machine Learning, Pipeline, DevOps, Data Version Control, Continuous Integration(CI), Continuous Deployment(CD), Continuous Training (CT), Workflow
  • item: Thesis-Abstract
    DRDP : dynamically re-configurable data pipeline in the EDGE network
    (2023) Nuwanthilaka, M.G.I.M; Perera, I
    Pipelines are a highly discussed topic in today’s technological world. There are different variations of pipelines; Data Science pipelines, DevOps pipelines, and DevSecOps pipelines, etc. A data science pipeline usually comes with a fixed architecture, which can be problematic in a fast-growing tech industry.Traditional data science pipelines may struggle to handle the volume, velocity, and variety of data at the edge, necessitating more dynamic and adaptable approaches. Many advancements are happening to bring the technology to the edge due to substantial data points generated in the sensor networks at the edge; from factory floors to log streams. So, in this thesis we first discuss the existing literature in the data pipeline domain under three main topics; data pipeline challenges, data pipeline architectures, and data pipeline security. Then we propose a methodology for dynamically re-configurable data pipeline architecture in the edge network. This way we expect to achieve more efficiency, controllability, and scalability of the data across networks.The emerging field of edge architecture presents opportunities for innovative approaches to data pipelines, enabling organizations to harness the full potential of edge data for advanced analytics, machine learning, and real-time decision-making.Further, we propose a prototype with Raspberry Pi-based programs to discuss the effectiveness of this novel method. Using this proposed architecture we have evaluated the results and later discussed how this benefits the current and future data pipeline implementation. We hope this contributes to the emerging edge architecture subject area. Keywords: data science, pipeline, architecture, edge
  • item: Thesis-Abstract
    A Forecasting toolkit for epidemic spreading
    (2023) Dandeniya, D; Perera, I
    The study introduces a novel approach to predict the presence or absence of COVID-19 without the use of laboratory tests, kits, or equipment. It uses machine learning algorithms. Instead, the method relies on the symptoms experienced by a person to make predictions. To achieve the best possible performance, the study applied seven supervised machine learning methodologies, including Naive Bayes, Logistic Regression, Random Forest, KNN, Gradient Boosting Classifier, Decision Tree, and Support Vector Machines. The algorithms were tested on the COVID 19 Symptoms and Presence Dataset in Kaggle. Then to improve their performance hyperparameter optimization was used. The study found that the Gradient Boosting Classifier was the most effective algorithm, achieving an accuracy of 97.4%. The proposed method has the capacity to accurately discover the presence or absence of COVID-19, without requiring any devices or laboratory tests. This suggests that the method may offer a convenient and efficient way to quickly identify COVID-19 cases without relying on traditional laboratory-based testing methods. The research suggests that machine learning algorithms can be useful tools for disease detection, even in the absence of laboratory tests. The proposed approach can help overcome the challenges of limited access to laboratory tests and kits, making disease detection more accessible and efficient.
  • item: Thesis-Abstract
    An Analysis of gof software design patterns on software maintainability in micro services
    (2023) Rathnayake, RMHSB; Perera, I
    The purpose of this paper is to identify how gang of four design patterns impact the maintainability of micro services-based systems. Design patterns were introduced as solutions to common problems that occur in programming. These are supposed to improve the maintainability of a system by improving the code quality. But with modern programming languages, frameworks, and integrated development environments, whether these patterns serve their purpose is a question that is not fully addressed. This paper proposes a tool that can be used to identify whether use of a specific design pattern by a specific developer for a particular micro service-based project can improve its maintainability or not. To do this a model has been created by analyzing enterprise micro service-based applications and gathering data from developers who were involved in the development of those projects. This data is used to create models for maintainability metrices coupling, lack of cohesion, duplication and cyclomatic complexity. This tool will help to decide whether the system is more maintainable with or without the use of selected design patterns. This helps better decision making in deciding how to write new code or refactoring existing code. Results of this research have shown that lack of cohesion is not affected by developer experience, design patterns or the language used in enterprise micro service-based applications. Cyclomatic complexity was only affected by the language used. Also, use of certain design patterns decreased the coupling in the system. But some of the design patterns caused duplications to be increased. So, the results showed that use of design patterns can have a negative and positive impact on the maintainability of a microservice depending on the design pattern used. This research also emphasizes the importance of code review process and code quality analysis automation. Keywords: GOF design patterns, micro services, software maintainability
  • item: Thesis-Abstract
    Finding compiler bugs VIA code mutation : a case study
    (2023) Abeygunawardana, CS; Perera, I
    Compiler errors can cause a variety of problems for software systems, including unexpected program behavior, security flaws, and system failures. These defects can be brought on by a number of things, including improper data type handling, poor code creation, and wrong code optimization. Compiler defects can be difficult to spot due to their complexity and, if ignored, can have severe effects. So, Identifying compiler defects is a crucial and difficult undertaking because it is difficult to produce reliable test programs. One of the most used software testing techniques for finding bugs and vulnerabilities is fuzzing. Fuzzing is the process of generating numerous inputs to a target application while keeping an eye out for any anomalies. Among fuzzing techniques, the most recent and promising methods for compiler validation are test program generation and mutation. Both methods have proven to be effective in identifying numerous problems in a variety of compilers, although they are still constrained by the techniques' use of valid code creation and mutation methodologies. Code mutation is a method that has grown in favor recently since it can find bugs that standard testing can forget. This technique involves performing minor adjustments to a program's source code to make versions of the original code, which are then compiled and evaluated to see if they deliver the desired outcomes. It is a sign that there might be a compiler issue if the output of the altered code differs from the output of the original code. Current mutation-based fuzzers randomly alter a program's input without comprehending its underlying grammar or semantics. In this study, we proposed a novel mutation technique that mutates the existing program while automatically understanding the syntax and semantic rules. Any type of compiler can be verified using the suggested method without regard to the semantics of the language. With that we can use this approach to test various other compilers without depend on the syntax of that language. We focus on evaluating the Ballerina compiler to the language syntax and semantics because Ballerina is a relatively new programming language. In this work initially we construct a test suite from the present testcases of that language and developed a syntax tree generator which can identify the syntax of that language and then developed semantic generator which can identify semantic of that language. With that we are able to mutate the existing test cases using our generator. Furthermore, we have analyzed the performance of our model with the number of test cases which use to train our model and the number of tokens in the generated file. Keywords: Compiler testing, random testing, random program generation, Automated testing
  • item: Thesis-Abstract
    Oblivious multi-cloud file storage
    (2023) Pushpakumara, ERTD; Rathnayake, S
    Cloud storage facilities are now predominantly used to store outgrowing data. Information availability, improved performance and the trustworthiness are the key factors that the data owners mainly focus on, in storing data with a third party. With the multi-tenant concept on cloud computing, security threats have been evolved, as the trustworthiness of the neighbors has become a doubt. A malicious user could monitor the traffic between the client and the CSP. By analyzing the traffic attacker can get a clear picture regarding what kind of data has been passed or retrieved by the client and these questions the privacy level of stored data. Critical, highly Sensitive and Personally Identifiable Information (PII) used in government organizations such as Defense Ministry, Person’s Registration, Motor Traffic Department, Immigration and Emigration systems, among others, require data privacy, integrity and confidentiality which demotivate them in storing these highly sensitive data on cloud storage. But these organizations handle thousands of data records and adding more day by day and the physical storage expansion has become a huge challenge with the investments on infrastructure. The proposed solution would address both these challenges. The major security concerns the proposed solution focuses on is the data privacy, integrity, and confidentiality. In this research we propose a novel approach to obfuscate the data distribution patterns in a multi cloud environment. The solution is to be implemented at the client side based on the systems’ business requirements. So that a unified interface could be provided in storing/retrieving data in several cloud platforms. The uploaded file is encrypted with a public key, calculated the hash value, and divided into several small file chunks. Then the file chunks are scattered across several Storage accounts created on several CSPs randomly and hence, the confidentiality, integrity and privacy of data also can be achieved. The proposed solution consists of a central component through which all the communication between the client and the CSPs take place. Technology which is used within the central component is related to the ORAM concept. Further this facilitates dynamical scaling up of cloud storages.
  • item: Thesis-Abstract
    Vision-based forward collision warning application for vehicles
    (2023) Rajakaruna, PNSA; Chitraranjan, C
    Driver Assistance Systems (DAS) have become an important part of vehicles, and there is a considerable amount of research in this area. Most accidents happen due to driver inattention caused by driver distraction and drowsiness. Driver Assistance Systems aim to minimize these conditions and increase road safety. Vision-based driver assistance plays a major role in DAS, where camera-based collision warning stands out as one of the most effective and accurate types. Our implementation is a collision warning system that utilizes a single monocular camera and performs 3D vehicle detection for better accuracy and performance. It is a low-cost, near real-time collision warning system that can be implemented on both new and old vehicles. For 2D vehicle detection, we employ YOLO, and then we estimate 3D bounding boxes based on the 2D bounding boxes. To track the vehicles, we use the Deep SORT algorithm. The application will generate a Birds Eye View (BEV) graph based on the 3D bounding box estimation. This BEV graph will represent a much more accurate position and orientation for vehicles in a 3D plane. Based on this data, the collision prediction algorithm will determine the possibility of a collision and output a warning signal. The collision prediction algorithm relies on the distance between the vehicle with the camera and other vehicles in each frame.
  • item: Thesis-Abstract
    Cost optimized scheduling for microservices in kubernetes
    (2023) Arunan, S; Perera, GIUS
    The usage of Container Orchestration Platform like Kubernetes for running Microservices applications is increasing nowadays. In a particular application, all Microservices do not have the same priority. Hence it is costly to allocate the same resources to both high and low-priority services. Spot instances are an attractive option for running low-priority services due to their significantly lower cost compared to On- Demand instances. Spot instances are available for use when cloud service providers have excess capacity and can be bid on at a much lower price than the On-Demand rate. But they can be revoked anytime by the Cloud provider which affects the availability of the services. This research aims to utilize Spot instances to run low-priority services with the intention of reducing the cloud cost while providing overall high availability to the application. A thorough literature review has been conducted on existing research that utilizes Spot instances to save cost while maintaining high availability. This study builds upon previous work and proposes a new approach to run low priority Microservices to save cost. A service called KubeEconomy has been proposed to monitor and manage Kubernetes worker nodes to efficiently schedule the Microservices. Three functionalities of the KubeEconomy service have been explained which contributes to the cost optimization. The KubeEconomy service utilizes cloud APIs and Kubernetes APIs to promptly scale and reschedule pods within different nodes. Two experiments were conducted to show the effectiveness of KubeEconomy service. In the first experiment, the KubeEconomy service was deployed on Azure cloud to manage a Kubernetes cluster. The experiment showed that the KubeEconomy service was able to dynamically provision and deprovision Spot instances based on the workload demand and Spot evictions, resulting in significant cost savings while maintaining high availability of the Microservices. In the second experiment, a simulation was conducted using the parameters gathered from the first experiment to calculate the cost savings of long running workloads. It is shown that it is possible to reduce the cloud cost up to 80% while maintaining 99% availability for the Microservices under optimal conditions.
  • item: Thesis-Abstract
    An Architecture for EEG based mental state recognition and monitoring
    (2023) Chandreswaran, Y; Perera, I
    Humans invent technologies to make today's life easy. Every human expects a healthy long life. A healthy life includes good physical health and stable mental health. There are multiple causes such as busy lifestyle, stress, sadness, anger, fear, etc. can affect the mental health of Human life. There are several approaches to overcoming this mental illness but the challenge is to monitor and measure the efficiency of treatments followed by humans. Therefore, a solution is proposed as a real-time non-invasive BCI system, which helps to predict the mental state and provides progress of improvement. This research work aims to predict human brain states using EEG-based signals and classify the human brain states in real time. The features and classification methods help to categorize the patterns of the brainwave. EEG signals communicate with BCI through the NeuroSky Headset with four sensors inbuilt. We have generated sample data sets for training and testing using the NeuroSky Headset. Systems have been tested with multiple feature extraction methods and feature pattern classification modes to build the prediction solution. The final solution contains a human-facing mobile web app, which reads the EEG signals from the NeuroSky Headset. In addition, the system contains a prediction component, a backend API component, and system managing dashboard components.
  • item: Thesis-Abstract
    Loglearn : predicting computer node failures using continuous machine learning
    (2023) Kabilesh, K; Perera, I
    Ensuring reliability, availability, and fault-tolerance is crucial in modern computer systems. Despite the substantial efforts put into the development, testing, and operation, failures still occur during runtime, leading to significant consequences. To address this issue, a proactive approach is necessary to predict and prevent failures before they happen. System and software logs provide essential data for monitoring systems and their performance during runtime. However, processing this information in real-time poses a unique challenge for machine learning because of the properties of streaming big data such as logs. Therefore, this study utilizes the continuous machine learning paradigm to develop a failure prediction model called LogLearn, which uses system log data. The design and development of LogLearn consider the drawbacks and limitations of current continuous machine learning models to provide a more efficient and accurate approach to predicting computer node failures and their potential root cause with a high lead time. The LogLearn model is implemented with an online failure prediction method, which is evaluated using multiple algorithms. Logistic regression showed the best performance in prediction. The LogLearn model outperformed previous studies’ models in terms of accuracy, precision, recall, and f1-score. Additionally, an online timeseries prediction model using the SNARIMAX algorithm was implemented to forecast the potential time of failure. Although previous studies have shown promising results, their lead times were insufficient to fix the underlying cause of failure in advance. Thus, LogLearn provides a viable alternative approach for failure prediction in computer systems.
  • item: Thesis-Abstract
    Sentiment analysis of financial stock market news using pre-trained language models
    (2022) Kaushalya WAS; Ranathunga S
    Sentiment analysis helps data analysts to find public opinion, actual meaning of the given text (positive meaning, neutral meaning or negative meaning) conduct market research, monitor brand and product reputation, and understand customer experiences of newly introduced items or service. Stock news sentiment analysis is a useful task in the financial domain. However, this is different from the customer feedback for a product or brand, movie review and customer support reviews. This huge difference is because of the domain specific language in stock markets and lack of labeled data. This research implements a stock news sentiment analysis system using the latest transformer-based pre-trained language models in NLP. I could get higher sentiment classification results for the transformer-based pretrained language models than the traditional classifications models in this research. Also I could reduce classification result bias for the particular stock market specific words, because of the transfer learning method. And I could introduce correlation between stock news sentiment and stock price change percentage value. This proposed model can predict the percentage change value of the stock when received a news.
  • item: Thesis-Abstract
    Intelligent deception detection for online interviews
    (2022) Fernando MLM; De Silva C
    When it comes to human communication, lying is a common practice. Recently, the detection of lies has become an important focus of judiciary, law enforcement, and security, interviews, etc.[1] Due COVID-19 the pandemic of interviews being conducted online; this is a main problem where a person may give false information specially in the visa applying process. Nonverbal behavior is constantly being transmitted by humans in opposition to spoken language. where visual and auditory cues like facial expressions, postures, gestures, and nonverbal vocal sounds can be used to detect deception intelligently. These human signals are known as deception indicators, and they are primarily associated with deceptive communication. The hiring of unskilled workers can eventually lead to a company's demise if an online interviewer exaggerates or fabricates his or her abilities.
  • item: Thesis-Abstract
    Transfer learning approch for detecting covid-19 using chest X- Ray images
    (2022) Muthunayake MNA; Chitranjan C
    Due to the (COVOD-19 coronavirus, the entire world is undergoing a pandemic. Coronavirus 2 produces severe acute respiratory illness. This virus is discovered in December 2019 in China, Wuhan. As we are experiencing, the affected patients are expanding at a rapid rate. The World Health Organization (WHO) has recommended that testing be done as much as possible to recognize those who are affected and those who are carriers of this disease. However, the main issue here is the scarcity of COVID-19 testing kits and trained people to perform the testing in a pandemic situation. However, a lot of research was seeking workaround solutions for detecting the COVID-19. As a result of these projects, a few papers were polished for detecting the COVID-19 based on chest Xray scan images. However, most of the research has used vanilla CNN, which makes the test more reliable and convenient. But we have some practical issues in the application of traditional CNN. Basically, CNN is a supervised learning method, and it takes more time for the learning process. And in general, CNN works well for larger datasets. However, the chest X-ray images are limited in practice, we propose combining transfer learning and ensemble learning techniques to achieve excellent accuracy while spending the least amount of time possible on the entire learning process. This study mainly focuses on the CNN based pre-trained models such as DenseNet201, EfficientNetB7 and VGG16 for increasing the accuracy level of the model, which makes the test reliable and more trustworthy.
  • item: Thesis-Abstract
    Cross - domain recommendation system for improving accuracy by focusing on diversity
    (2022) Herath AAHWS; Ahangama S
    With the rapidly developing technology world, recommender systems also improving day by day since customer expectations also vary from new angles making new business trends. As a result of this kind of situation, enterprise-level recommender systems require more modifications with new improvements to achieve a high user satisfaction level. in that case it seems currently most commercial recommender systems are struggling with low recommender quality by decreasing user trust and expectations. On the other hand, it senses only the recommender accuracy is not sufficient to measure recommender quality. Under the major domain recommender system, the cross-domain recommender system is one of the not much-explored areas and it needs more research works focused on diversity like subjective metrics rather than accuracy. With the purpose of improving accuracy by focusing diversity on CDRS here, I have built a matrix factorization-based collaborative filtering crossdomain recommender system using explicit user feedback with movilens100k research data set. When it comes to cross-domain recommender systems, the most frequent approach is to measure and evaluate their relevancy using standard predicted accuracy metrics such as root mean squared error (RMSE), mean absolute error (MAE), and so on. Since the more need than accuracy to maintain high-quality recommendations, we need to pay attention to a few specific areas beyond accuracy like diversity and novelty. We have measured our CDRS model’s performance via RMSE, MSE, MAE, FCP, hit ratio, and Precision@k and in all cases, CDRS has achieved good performance than the general CF model. Moreover, we measured the CDRS model’s diversity and novelty and could see both are increasing when top-N increasing. These findings would be pretty much worthy when we are implementing enterprise-level cross-domain recommender systems in the future to achieve success in each modern business use case with enhancing user satisfaction.
  • item: Thesis-Abstract
    Guaranteeing service lavel agreements for triangle counting via observation - based admission control algorithm
    (2022) Weerakkody WACR; Jayasena S; Dayarathna M
    Increasingly large graph processing applications adopt the approach of partitioning and then distributed processing. However, maintaining guaranteed Service Level Agreement (SLA) on distributed graph processing for concurrent query execution is challenging because graph processing by nature is an unbalanced problem. We investigate on maintaining predefined service level agreements for commonly found graph processing workload mixtures. We develop a Graph Query Scheduler Mechanism (GQSM) which maintains a guaranteed service level agreement in terms of overall latency. The proposed GQSM model is implemented using the queueing theory. Main component of GQSM is a job scheduler which is responsible for listening to an incoming job queue and scheduling the jobs received. The proposed model has a calibration phase where the Service Level Agreement data, load average curve data, and maximum load average which can be handled by the hosts participating in the cluster without violating SLA is captured for the graphs in the system. After completing the calibration phase the job scheduler is capable of predicting the load average curve for the incoming job requests. The scheduler checks whether the maximum load average extracted from the predicted load average curve exceeds the load average threshold values captured in the calibration phase. Based on the result the job scheduler accepts or rejects the job requests received. Results show that SLA is successfully maintained when the total number of users is less than 6 in a JasmineGraph cluster deployed in a single host. For distributed clusters the number of users can go up to 10 without violating SLA. The proposed model is scalable and it can be applied to a distributed environment as well. As future work, the proposed model can be extended to work with less initial calibration steps and the scheduling algorithm can be improved with intelligent workload management among hosts for more efficient resource consumption.
  • item: Thesis-Abstract
    Cervical cancer predicting system using machine learning
    (2022) Prabodhani APKC; Karunarathne B
    Machine Learning has become a vital tool in everyday life, as well as a potent tool for automating most of the industries we want to automate. Machine Learning is a method of developing algorithms that learn from data, which might be labelled, unlabelled or learned from the environment. Machine Learning is employed in a variety of industries, including health care, where it provides much greater benefits through a proper decision and prediction processes. Because the machine learning in health care is scientific research, we must save, retrieve, and properly use information and data, as well as give knowledge about the difficulties that face the healthcare industry and proper decision-making. Over the years, these technologies have resulted in significant advancements in the health-care sector. Medical experts employ the machine learning tools and techniques to analyse medical data in order to identify hazards and provide accurate diagnosis and treatment. The paper aims to build a web application and put a trained machine learning model into production using Flask API. Here use cancer data to predict cervical cancer using machine learning. Therefore this project helps to use machine learning models for end-users or systems.
  • item: Thesis-Abstract
    An Analytical study of pre - trained models for sentiment analysis of sinhala news comments
    (2022) Dissanayake MLS; Thayasivam U
    In the area of natural language processing, due to the large-scale text data availability sentiment analysis has become a prevalence topic. Sentiment analysis is a text classification which is mainly focusing on classifying recommendations and reviews as positive or negative. Earlier for this classification task, most of methods require product reviews and label them. Using these reviews then a classifier is trained with their relevant labels. For this training procedure a huge number of labeled data is needed to train these classification models for each of the product, considering the facts that the distribution of the reviews can be different between different domains and to enhance the performance of these classification models. Nevertheless, the procedure of labeling the data is very expensive and time consuming. For low resource languages like Sinhala language, the existence of annotated Sinhala data is limited compared to the languages like English language. The need of applying classification algorithms in order to perform sentiment classification for Sinhala language is challenging. Apart from applying traditional algorithms to analyze sentiments, here using pre-trained models(PTM)s, experimenting on whether the outcome of these experiments outperform the traditional methods. In natural language processing, PTM is performing an important role, since it paves the way for applying PTMs for downstream tasks. Therefore, this research takes the step to applying PTMs such as BERT and XLnet to classify sentiments. Experiments have been done using two approaches on BERT model as fine tuning the BERT model and feature based approach. Also using the existing Roberta-based Sinhala models, named as SinBERT-small and SinBERT-large which are available in Huggingface official site which have trained using a large Sinhala language corpus.
  • item: Thesis-Abstract
    Improving the performance of real-time data analytics applications by optimising the database aggregations
    (2022) Samaranayake TDMP; Perera I
    Organisations must make the best decision at the appropriate time to obtain a competitive advantage in a fast-changing market. To accomplish so, it's critical to make faster and more efficient judgments based on near-real-time data analysis. When it comes to these real-time streaming data analysis systems, the performance of the database is having a huge impact on such applications as it is required to achieve data availability and continuous processing for a large volume of data without a delay. When it comes to streaming data, data warehousing is more challenging. So, it is required to consider performance improvements in all the steps of the Extraction, Transformation, Load (ETL) process and the database architecture level. Therefore, the proposed approach is to improve the performance of the system by optimising the ETL process (Extraction, Transformation, and Load) and real-time data warehousing. In this approach, the optimised aggregation algorithm is introduced. Apart from that, the hardware, storage schemas, and query optimization of the data warehouse are also considered and this study is evaluating the performance of the centralised architecture for the real-time data warehouse.