Enhancing video generation based on text-to-image diffusion models using a multimodal approach

dc.contributor.authorSanjula Appuhamy, KD
dc.contributor.authorMiriya Thanthrige, USKP
dc.date.accessioned2026-01-14T06:56:37Z
dc.date.issued2025
dc.description.abstractThe rapid evolution of Artificial Intelligence (AI) has revolutionized multimedia creation, particularly through Textto-Image (T2I) diffusion models, which synthesize images from textual descriptions with impressive capabilities. Building on this, enhancing video GIF generation has become a promising frontier. However, the field of Text-to-Video (T2V) GIFs remains underexplored, with limited research addressing this specific application. This synthesis method expands creative expression and finds utility in education, marketing, and entertainment. The lack of focused work highlights both novelty and opportunity for impactful contributions. This study focuses on optimizing T2V GIF generation in terms of computational and parameter efficiency. The resulting model maintains a parameter count below 1 billion, enabling faster training, reduced inference time, lower memory usage, and compatibility with low-end hardware. Inspired by previous work using 2 × 2 grid diffusion and frame interpolation models, this research proposes a simplified approach using a single Stable Diffusion model. It generates all 16 frames of an animated GIF within a 4×4 grid, eliminating prior post processing steps. Given that the GIF format emphasizes animation over fine detail, this parameter efficient method is well suited.
dc.identifier.conferenceMoratuwa Engineering Research Conference 2025
dc.identifier.departmentEngineering Research Unit, University of Moratuwa
dc.identifier.emaildilankasanjula@gmail.com
dc.identifier.emailsampathk@uom.lk
dc.identifier.facultyEngineering
dc.identifier.isbn979-8-3315-6724-8
dc.identifier.pgnospp. 304-309
dc.identifier.proceedingProceedings of Moratuwa Engineering Research Conference 2025
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24726
dc.language.isoen
dc.publisherIEEE
dc.subjecttext-to-image
dc.subjecttext-to-video
dc.subjecttext-to-video gif.
dc.titleEnhancing video generation based on text-to-image diffusion models using a multimodal approach
dc.typeConference-Full-text

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1571153356.pdf
Size:
2.88 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections