Evaluation and Acceptability of Generative AI Solutions in Online Product Information, Recommendations and Reviews

Loading...
Thumbnail Image

Authors

Oburu, George

Issue Date

2025-11

Type

Dissertation

Language

en

Keywords

Generative AI , NLP Metrics , GenAI Evaluation

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

This quantitative comparative experimental study assessed the performance differences between open-source Generative AI (GenAI) solutions in curating online product information. The research evaluated six Large Language Models (LLMs)—Llama-3, Mistral-3, FalconMamba-7B, Gemma-3, XGen-9B, and DeepSeek-R1—focusing on their capabilities to generate accurate and meaningful product descriptions. The evaluation utilized established Natural Language Processing (NLP) metrics: Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), Bidirectional Encoder Representations from Transformers (BERT Score), Precision, and Recall (P/F1) scores. Secondary datasets from Amazon and Walmart provided the benchmark for assessing GenAI-generated content. Results indicate statistically significant performance variations among the models for most metrics, except BLEU. The study highlighted the efficacy of NLP metrics in evaluating GenAI content generation, offering insights for businesses seeking to adopt GenAI solutions for marketing and e-commerce. Recommendations for future research include expanding to multimodal GenAI applications covering text, images, and videos.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN