Efficiently run a deep learning model on a distributed CPU system

dc.contributor.advisorFernando, S
dc.contributor.authorGnanasena, DDKK
dc.date.accept2024
dc.date.accessioned2025-09-26T09:49:19Z
dc.date.issued2024
dc.description.abstractIn Distributed Deep Learning, Data transfer between Computer equipment might act as a bottleneck that prevents the system from growing to its full capacity. To overcome this, people have come with methods like Quantization and Finetuning which work well only on GPUs and TPUs which are costly infrastructure. However, if we could suffice from running deep learning models on CPU based Distributed System, it is a great edge in cost reduction. So, in this research such a system is created and evaluated to find out whether they can compete with GPUs using MultiWorkerMirroredStrategy. MNIST dataset on a linear model, CIFAR10 dataset on a Convolution Neural Network (CNN) model and finally a quantized Llama 2 model was run to summarize and translate an essay. They were evaluated by their accuracy and time taken to train against running them in a Google Colab T4 GPU(For Llama model, BertScore and Rouge Score for summarization and BertScore and BleuScore for Translation was used). Performance of MNIST data concluded, this system is inefficient for small models as the time taken to train on two machines was 200 times of a single machine. When running CIFAR10 dataset CNN accuracy decreases over the number of machines on multi-node and the increasing batch sizes and time taken decreases and increased after a certain batch size showing the bottleneck of communication. In Llama model it was observed that with increasing dataset size the time taken also reduces when compared with GPU time taken. None of the models were fast enough to pass the time taken for the GPU in all cases but some accuracies in Llama model were able to pass GPU accuracies. For future works Model parallelism should also be tested for further development of this CPU Distributed system.
dc.identifier.accnoTH5765
dc.identifier.citationGnanasena, D.D.K.K. (2024). Efficiently run a deep learning model on a distributed CPU system [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24230
dc.identifier.degreeMSc in Artificial Intelligence
dc.identifier.departmentDepartment of Computational Mathematics
dc.identifier.facultyIT
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24230
dc.language.isoen
dc.subjectMULTI WORKER MIRRORED STRATGY
dc.subjectDISTRIBUTED DEEP LEARNING
dc.subjectGPU MEMORY
dc.subjectARTIFICIAL INTELLIGENCE-Dissertation
dc.subjectCOMPUTATIONAL MATHEMATICS-Dissertation
dc.subjectMSc in Artificial Intelligence
dc.titleEfficiently run a deep learning model on a distributed CPU system
dc.typeThesis-Full-text

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5765-1.pdf
Size:
175.98 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5765-2.pdf
Size:
83.25 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5765.pdf
Size:
2.1 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: