Term of Award

Spring 2025

Degree Name

Master of Science, Information Technology

Document Type and Release Option

Thesis (open access)

Copyright Statement / License for Reuse

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Department of Information Technology

Committee Chair

Hayden Wimmer

Committee Member 1

Jongyeop Kim

Committee Member 2

Atef Mohamed

Abstract

Contextual understanding is a significant challenge of Large Language Models (LLMs), which are typically trained on general-purpose datasets. Due to this, LLMs fail to capture nuanced or domain-specific information and may struggle to interpret user queries accurately. Consequently, prompt engineering can become complex in automating, and LLMs are prone to “hallucinating”—generating random or irrelevant texts—when they lack sufficient context. This undermines their ability to provide focused, accurate responses. Accordingly, this thesis seeks to enhance the contextual understanding capabilities of Artificial Intelligence systems to facilitate more precise and relevant answer generation. Study A looks into a new approach to combating misinformation on social media using Retrieval-Augmented Generation (RAG) with LangChain. The proposed model compares users’ posts against corresponding news articles by feeding both into LLM to assess their alignment. This process enables the LLM to detect potential misinformation in the user’s post.

To empirically validate the model’s effectiveness, we conduct a t-test to evaluate our hypothesis; the results confirm that this RAG-based method significantly improves misinformation detection. Study B explores the importance of sentiment in text-to-image generation, focusing on a new text-to-sticker prediction model.

The model generates stickers from textual inputs based on sentiment and contextual cues, making digital interactions more engaging. To evaluate its effectiveness, we compared sticker outputs that included sentiment values against those that did not. A t-test analysis revealed that incorporating sentiment significantly improves the stickers’ relevance to user conversations and emotions, ultimately making them more appealing than the sentiment-free outputs. Study C focuses on improving LLM contextual understanding through external context sourcing RAG. In this approach, document containing text, images, and tables, are transformed into high-dimensional vectors via LLM-generated embeddings, then stored in a vector database. When a user submits a query, the LLM retrieves the most relevant documents from this vector store, enabling a deeper understanding of the query and improving response quality. A comparative analysis—supported by a t-test—indicates that this RAG-based model outperforms typical LLM, demonstrating greater efficiency and accuracy. These research studies are combined in this thesis to unlock the hidden potentials of LLMs by providing a comprehensive understanding of improving Artificial Intelligence in contextualization for accurate answer generation and engagement. This solution eradicates LLMs’ hallucinations by providing them with the right context and putting machines’ reasoning on the same page as humans. The thesis also creates a solution to the long challenges of AI and the battle with the spread of misinformation through digital means.

Research Data and Supplementary Material

No

Share

COinS