Diffusion based virtual try on : DiMVTON
| dc.contributor.advisor | Silva, ATP | |
| dc.contributor.author | De Zoysa, RSN | |
| dc.date.accept | 2025 | |
| dc.date.accessioned | 2025-12-08T05:09:32Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | Diffusion models have recently set new standards for realism in virtual try-on tasks, yet most existing systems are burdened by the need for additional modules such as Reference Networks, complex image/text encoders, and heavy preprocessing pipelines. These extra components substantially increase the number of trainable parameters, GPU memory consumption, and overall computational cost. In this paper, we introduce DiMVTON, a highly efficient diffusion-based framework for virtual try-on that rethinks this complexity. Instead of relying on external conditioning networks, DiMVTON simply concatenates person and garment inputs along the spatial dimension and feeds them directly into a streamlined denoising UNet. Our approach is driven by three main efficiency goals: (1) Compact architecture - DiMVTON uses only a VAE and a minimal UNet without cross-attention or external encoders, achieving a total model size of 894.29 million parameters. (2) Selective fine-tuning - comprehensive studies reveal that the UNet’s self-attention layers are the critical elements for aligning garments onto individuals. Fine-tuning only these layers enables strong performance with just 0.39 million trainable parameters (around 0.04% of the backbone), as its further optimized through Low-Rank Adaptation (LoRA) techniques. (3) Minimal inference overhead – Unlike other diffusion-based models that require auxiliary information like human parsing maps, pose annotations, or textual descriptions, DiMVTON needs only a person image, a garment reference, and a simple mask, cutting memory usage by over 90%. Despite being trained on a relatively small dataset of 13,000 samples, DiMVTON achieves competitive qualitative and quantitative results and shows strong generalization in real-world scenarios. Our findings suggest that high-quality virtual try-on is possible without complex architectures, provided that fine-tuning is applied strategically to key network components. | |
| dc.identifier.accno | TH5924 | |
| dc.identifier.citation | De Zoysa, R.S.N. (2025). Diffusion based virtual try on : DiMVTON [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24527 | |
| dc.identifier.degree | MSc in Artificial Intelligence | |
| dc.identifier.department | Department of Computational Mathematics | |
| dc.identifier.faculty | IT | |
| dc.identifier.uri | https://dl.lib.uom.lk/handle/123/24527 | |
| dc.language.iso | en | |
| dc.subject | FASHION | |
| dc.subject | ONLINE-Virtual Try-On | |
| dc.subject | ONLINE-Diffusion Models | |
| dc.subject | VIRTUAL TRY-ON | |
| dc.subject | -Efficient Training | |
| dc.subject | DiMVTON-Self-Attention layers | |
| dc.subject | DiMVTON-LoRA Fine-Tuning | |
| dc.subject | ARTIFICIAL INTELLIGENCE-Dissertation | |
| dc.subject | COMPUTATIONAL MATHEMATICS-Dissertation | |
| dc.subject | MSc in Artificial Intelligence | |
| dc.title | Diffusion based virtual try on : DiMVTON | |
| dc.type | Thesis-Abstract |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- TH5924-1.pdf
- Size:
- 655.98 KB
- Format:
- Adobe Portable Document Format
- Description:
- Pre-text
Loading...
- Name:
- TH5924-2.pdf
- Size:
- 124.28 KB
- Format:
- Adobe Portable Document Format
- Description:
- Post-text
Loading...
- Name:
- TH5924.pdf
- Size:
- 13.01 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full-thesis
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
