GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing
- URL: http://arxiv.org/abs/2509.14221v2
- Date: Tue, 07 Oct 2025 03:29:20 GMT
- Title: GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing
- Authors: Silan Hu, Shiqi Zhang, Yimin Shi, Xiaokui Xiao,
- Abstract summary: We propose GEM-Bench, the first comprehensive benchmark for ad-injected response generation in Gene Engine Marketing (GEM)<n>Our preliminary results indicate that, while simple prompt-based methods achieve reasonable engagement such as click-through rate, they often reduce user satisfaction.<n>These findings highlight the need for future research on designing more effective and efficient solutions for generating ad-injected responses in GEM.
- Score: 19.604674396120405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Engine Marketing (GEM) is an emerging ecosystem for monetizing generative engines, such as LLM-based chatbots, by seamlessly integrating relevant advertisements into their responses. At the core of GEM lies the generation and evaluation of ad-injected responses. However, existing benchmarks are not specifically designed for this purpose, which limits future research. To address this gap, we propose GEM-Bench, the first comprehensive benchmark for ad-injected response generation in GEM. GEM-Bench includes three curated datasets covering both chatbot and search scenarios, a metric ontology that captures multiple dimensions of user satisfaction and engagement, and several baseline solutions implemented within an extensible multi-agent framework. Our preliminary results indicate that, while simple prompt-based methods achieve reasonable engagement such as click-through rate, they often reduce user satisfaction. In contrast, approaches that insert ads based on pre-generated ad-free responses help mitigate this issue but introduce additional overhead. These findings highlight the need for future research on designing more effective and efficient solutions for generating ad-injected responses in GEM. The benchmark and all related resources are publicly available at https://gem-bench.org/.
Related papers
- InnoGym: Benchmarking the Innovation Potential of AI Agents [74.64144272881414]
InnoGym is the first benchmark designed to evaluate the innovation potential of AI agents.<n>InnoGym introduces two complementary metrics: performance gain, which measures improvement over the best-known solutions, and novelty, which captures methodological differences from prior approaches.
arXiv Detail & Related papers (2025-12-01T16:03:04Z) - Caption Injection for Optimization in Generative Search Engine [15.472540238931202]
Generative Search Engines (GSEs) leverage Retrieval-Augmented Generation (RAG) techniques and Large Language Models (LLMs)<n>We propose Caption Injection, the first multimodal G-SEO approach, which extracts captions from images and injects them into textual content.<n> Experimental results show that Caption Injection significantly outperforms text-only G-SEO baselines under the G-Eval metric.
arXiv Detail & Related papers (2025-11-06T05:37:27Z) - RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering [50.42577862494645]
We present RAG-IGBench, a benchmark designed to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering.<n>RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content.
arXiv Detail & Related papers (2025-10-11T03:06:39Z) - Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations [70.94563079082751]
E-commerce has exposed the limitations of traditional product retrieval systems in managing complex, multi-turn user interactions.<n>We propose a novel framework that introduces test-time scaling into conversational multimodal product retrieval.<n>Our approach builds on a generative retriever, further augmented with a test-time reranking mechanism that improves retrieval accuracy and better aligns results with evolving user intent throughout the dialogue.
arXiv Detail & Related papers (2025-08-25T15:38:56Z) - Role-Augmented Intent-Driven Generative Search Engine Optimization [9.876307656819039]
We propose a Role-Augmented Intent-Driven Generative Search Engine Optimization (G-SEO) method.<n>Our method models search intent through reflective refinement across diverse informational roles, enabling targeted content enhancement.<n> Experimental results demonstrate that search intent serves as an effective signal for guiding content optimization.
arXiv Detail & Related papers (2025-08-15T02:08:55Z) - Careful Queries, Credible Results: Teaching RAG Models Advanced Web Search Tools with Reinforcement Learning [48.46951981642895]
We propose WebFilter, a novel RAG framework that generates source-restricted queries and filters out unreliable content.<n>We show that WebFilter improves answer quality and retrieval precision, outperforming existing RAG methods on both in-domain and out-of-domain benchmarks.
arXiv Detail & Related papers (2025-08-11T13:08:37Z) - TeamCMU at Touché: Adversarial Co-Evolution for Advertisement Integration and Detection in Conversational Search [1.187456026346823]
integration of advertisements into generated responses presents both commercial opportunities and challenges for user experience.<n>We propose a modular pipeline for advertisement management in RAG-based conversational systems, consisting of an ad-rewriter for seamless ad integration and a robust ad-classifier for detection.
arXiv Detail & Related papers (2025-07-01T07:24:29Z) - Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising [16.8420671443003]
Retrieval systems primarily address the challenge of matching user queries with the most relevant advertisements.<n>We propose a Multi-objective aligned Bidword Generation Model (MoBGM), which is composed of a discriminator, generator, and preference alignment module.<n>Our proposed algorithm significantly outperforms the state of the art in offline and online experiments.
arXiv Detail & Related papers (2025-06-04T10:57:18Z) - InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z) - The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents.<n>We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature.<n>We conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks.
arXiv Detail & Related papers (2024-12-06T23:43:59Z) - Better RAG using Relevant Information Gain [1.5604249682593647]
A common way to extend the memory of large language models (LLMs) is by retrieval augmented generation (RAG)<n>We propose a novel simple optimization metric based on relevant information gain, a probabilistic measure of the total information relevant to a query for a set of retrieved results.<n>When used as a drop-in replacement for the retrieval component of a RAG system, this method yields state-of-the-art performance on question answering tasks.
arXiv Detail & Related papers (2024-07-16T18:09:21Z) - Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions [89.35345649303451]
Generative search engines have the potential to transform how people seek information online.
But generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate.
Retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system.
arXiv Detail & Related papers (2024-02-25T11:22:19Z) - GEO: Generative Engine Optimization [50.45232692363787]
We formalize the unified framework of generative engines (GEs)
GEs use large language models (LLMs) to gather and summarize information to answer user queries.
Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them.
We introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in generative engine responses.
arXiv Detail & Related papers (2023-11-16T10:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.