Term of Award
Spring 2025
Degree Name
Master of Science, Information Technology
Document Type and Release Option
Thesis (open access)
Copyright Statement / License for Reuse
This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Department of Information Technology
Committee Chair
Hayden Wimmer
Committee Member 1
Jongyeop Kim
Committee Member 2
Atef Mohamed
Abstract
In the era of rapid technological advancement, efficient content generation, application development, and data management are crucial for meeting the demands of dynamic digital environments. This thesis uses state-of-the-art models to explore three core areas: AI-driven video content creation, text-to-image-to-text consistency, and automatic text summarization. The first study investigates the potential of AI-powered text-to-video generation to democratize video production and enhance storytelling. By comparing the performance of three models—ModelScope, Text2Video (Zero), and Motion Consistency—this study assessed the quality of generated videos using CLIP scores. It evaluated statistical significance through t-tests and homogeneity tests. Results indicate that ModelScope outperformed the others, though the differences were not statistically significant. These findings underscore AI's transformative role in content creation, making high-quality video production more accessible. The second study evaluates the semantic consistency of a text-to-image-to-text pipeline using four models—DALL·E, Imagen, Grok, and Stable Diffusion. Text prompts were used to generate images, which were then converted back to text using image captioning models. BERTScore, METEOR, ROUGE, and BLEU were employed to assess the similarity between the original prompts and reconstructed text. Pearson correlation analysis and paired t-tests indicated no statistically significant differences among models (p > 0.05), although Stable Diffusion exhibited slightly higher scores. The results highlight the strengths and limitations of current multi-modal models in maintaining semantic fidelity across complex prompts. The third study focuses on automatic text summarization (ATS) by evaluating four leading transformer models—Pegasus, BART, T5, and FLAN-T5—on Amazon review datasets. The models were fine-tuned and assessed using ROUGE metrics to measure contextual fluency and coherence. Statistical analyses, including paired t-tests, revealed Pegasus as the top-performing model, excelling in fluency and structural coherence. These findings provide valuable insights into the effectiveness of transformer models for summarization tasks. These studies offer a comprehensive understanding of AI applications across diverse domains. They provide developers, researchers, and organizations with the knowledge to make informed decisions about integrating AI technologies to enhance content generation, optimize data processing, and improve overall system performance.
Recommended Citation
Akinola, Azeezat O., "Evaluating Multimodal AI Systems: A Comparative Analysis of Large Languagel Model-Based Models for Text, Image, and Video Generation" (2025). Electronic Theses and Dissertations. 2944.
https://digitalcommons.georgiasouthern.edu/etd/2944
Research Data and Supplementary Material
Yes