Enhancing video generation based on text-to-image diffusion models using a multimodal approach
| dc.contributor.author | Sanjula Appuhamy, KD | |
| dc.contributor.author | Miriya Thanthrige, USKP | |
| dc.date.accessioned | 2026-01-14T06:56:37Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | The rapid evolution of Artificial Intelligence (AI) has revolutionized multimedia creation, particularly through Textto-Image (T2I) diffusion models, which synthesize images from textual descriptions with impressive capabilities. Building on this, enhancing video GIF generation has become a promising frontier. However, the field of Text-to-Video (T2V) GIFs remains underexplored, with limited research addressing this specific application. This synthesis method expands creative expression and finds utility in education, marketing, and entertainment. The lack of focused work highlights both novelty and opportunity for impactful contributions. This study focuses on optimizing T2V GIF generation in terms of computational and parameter efficiency. The resulting model maintains a parameter count below 1 billion, enabling faster training, reduced inference time, lower memory usage, and compatibility with low-end hardware. Inspired by previous work using 2 × 2 grid diffusion and frame interpolation models, this research proposes a simplified approach using a single Stable Diffusion model. It generates all 16 frames of an animated GIF within a 4×4 grid, eliminating prior post processing steps. Given that the GIF format emphasizes animation over fine detail, this parameter efficient method is well suited. | |
| dc.identifier.conference | Moratuwa Engineering Research Conference 2025 | |
| dc.identifier.department | Engineering Research Unit, University of Moratuwa | |
| dc.identifier.email | dilankasanjula@gmail.com | |
| dc.identifier.email | sampathk@uom.lk | |
| dc.identifier.faculty | Engineering | |
| dc.identifier.isbn | 979-8-3315-6724-8 | |
| dc.identifier.pgnos | pp. 304-309 | |
| dc.identifier.proceeding | Proceedings of Moratuwa Engineering Research Conference 2025 | |
| dc.identifier.uri | https://dl.lib.uom.lk/handle/123/24726 | |
| dc.language.iso | en | |
| dc.publisher | IEEE | |
| dc.subject | text-to-image | |
| dc.subject | text-to-video | |
| dc.subject | text-to-video gif. | |
| dc.title | Enhancing video generation based on text-to-image diffusion models using a multimodal approach | |
| dc.type | Conference-Full-text |
