Diffusion based virtual try on : DiMVTON

dc.contributor.advisorSilva, ATP
dc.contributor.authorDe Zoysa, RSN
dc.date.accept2025
dc.date.accessioned2025-12-08T05:09:32Z
dc.date.issued2025
dc.description.abstractDiffusion models have recently set new standards for realism in virtual try-on tasks, yet most existing systems are burdened by the need for additional modules such as Reference Networks, complex image/text encoders, and heavy preprocessing pipelines. These extra components substantially increase the number of trainable parameters, GPU memory consumption, and overall computational cost. In this paper, we introduce DiMVTON, a highly efficient diffusion-based framework for virtual try-on that rethinks this complexity. Instead of relying on external conditioning networks, DiMVTON simply concatenates person and garment inputs along the spatial dimension and feeds them directly into a streamlined denoising UNet. Our approach is driven by three main efficiency goals: (1) Compact architecture - DiMVTON uses only a VAE and a minimal UNet without cross-attention or external encoders, achieving a total model size of 894.29 million parameters. (2) Selective fine-tuning - comprehensive studies reveal that the UNet’s self-attention layers are the critical elements for aligning garments onto individuals. Fine-tuning only these layers enables strong performance with just 0.39 million trainable parameters (around 0.04% of the backbone), as its further optimized through Low-Rank Adaptation (LoRA) techniques. (3) Minimal inference overhead – Unlike other diffusion-based models that require auxiliary information like human parsing maps, pose annotations, or textual descriptions, DiMVTON needs only a person image, a garment reference, and a simple mask, cutting memory usage by over 90%. Despite being trained on a relatively small dataset of 13,000 samples, DiMVTON achieves competitive qualitative and quantitative results and shows strong generalization in real-world scenarios. Our findings suggest that high-quality virtual try-on is possible without complex architectures, provided that fine-tuning is applied strategically to key network components.
dc.identifier.accnoTH5924
dc.identifier.citationDe Zoysa, R.S.N. (2025). Diffusion based virtual try on : DiMVTON [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24527
dc.identifier.degreeMSc in Artificial Intelligence
dc.identifier.departmentDepartment of Computational Mathematics
dc.identifier.facultyIT
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24527
dc.language.isoen
dc.subjectFASHION
dc.subjectONLINE-Virtual Try-On
dc.subjectONLINE-Diffusion Models
dc.subjectVIRTUAL TRY-ON
dc.subject-Efficient Training
dc.subjectDiMVTON-Self-Attention layers
dc.subjectDiMVTON-LoRA Fine-Tuning
dc.subjectARTIFICIAL INTELLIGENCE-Dissertation
dc.subjectCOMPUTATIONAL MATHEMATICS-Dissertation
dc.subjectMSc in Artificial Intelligence
dc.titleDiffusion based virtual try on : DiMVTON
dc.typeThesis-Abstract

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5924-1.pdf
Size:
655.98 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH5924-2.pdf
Size:
124.28 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH5924.pdf
Size:
13.01 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: