Evaluation and Acceptability of Generative AI Solutions in Online Product Information, Recommendations and Reviews
Evaluation and Acceptability of Generative AI Solutions in Online Product Information, Recommendations and Reviews
Loading...
Issue Date
2025-11
Authors
Oburu, George
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This quantitative comparative experimental study assessed the performance differences between open-source Generative AI (GenAI) solutions in curating online product information. The research evaluated six Large Language Models (LLMs)—Llama-3, Mistral-3, FalconMamba-7B, Gemma-3, XGen-9B, and DeepSeek-R1—focusing on their capabilities to generate accurate and meaningful product descriptions. The evaluation utilized established Natural Language Processing (NLP) metrics: Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), Bidirectional Encoder Representations from Transformers (BERT Score), Precision, and Recall (P/F1) scores. Secondary datasets from Amazon and Walmart provided the benchmark for assessing GenAI-generated content. Results indicate statistically significant performance variations among the models for most metrics, except BLEU. The study highlighted the efficacy of NLP metrics in evaluating GenAI content generation, offering insights for businesses seeking to adopt GenAI solutions for marketing and e-commerce. Recommendations for future research include expanding to multimodal GenAI applications covering text, images, and videos.
Description
Keywords
Generative AI , NLP Metrics , GenAI Evaluation
