Evaluation and Acceptability of Generative AI Solutions in Online Product Information, Recommendations and Reviews

Oburu, George

Evaluation and Acceptability of Generative AI Solutions in Online Product Information, Recommendations and Reviews

Files

oburugo-dissertation-manuscript.pdf (2.13 MB)

Issue Date

2025-11

Authors

Oburu, George

Abstract

This quantitative comparative experimental study assessed the performance differences between open-source Generative AI (GenAI) solutions in curating online product information. The research evaluated six Large Language Models (LLMs)—Llama-3, Mistral-3, FalconMamba-7B, Gemma-3, XGen-9B, and DeepSeek-R1—focusing on their capabilities to generate accurate and meaningful product descriptions. The evaluation utilized established Natural Language Processing (NLP) metrics: Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), Bidirectional Encoder Representations from Transformers (BERT Score), Precision, and Recall (P/F1) scores. Secondary datasets from Amazon and Walmart provided the benchmark for assessing GenAI-generated content. Results indicate statistically significant performance variations among the models for most metrics, except BLEU. The study highlighted the efficacy of NLP metrics in evaluating GenAI content generation, offering insights for businesses seeking to adopt GenAI solutions for marketing and e-commerce. Recommendations for future research include expanding to multimodal GenAI applications covering text, images, and videos.

Keywords

Generative AI , NLP Metrics , GenAI Evaluation

URI

https://hdl.handle.net/20.500.11803/5044

Collections

NU Masters and Dissertations (Open Access)

Full item page