What Generative Search Engines Like and How to Optimize Web Content Cooperatively
- URL: http://arxiv.org/abs/2510.11438v1
- Date: Mon, 13 Oct 2025 14:10:26 GMT
- Title: What Generative Search Engines Like and How to Optimize Web Content Cooperatively
- Authors: Yujiang Wu, Shanshan Zhong, Yubin Kim, Chenyan Xiong,
- Abstract summary: Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences.<n>Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them.<n>We introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation.
- Score: 29.736392688493808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By employing large language models (LLMs) to retrieve documents and generate natural language responses, Generative Engines, such as Google AI overview and ChatGPT, provide significantly enhanced user experiences and have rapidly become the new form of search. Their rapid adoption also drives the needs of Generative Engine Optimization (GEO), as content providers are eager to gain more traction from them. In this paper, we introduce AutoGEO, a framework to automatically learn generative engine preferences when using retrieved contents for response generation, and rewrite web contents for more such traction. AutoGEO first prompts frontier LLMs to explain generative engine preferences and extract meaningful preference rules from these explanations. Then it uses preference rules as context engineering for AutoGEO$_\text{API}$, a prompt-based GEO system, and as rule-based rewards to train AutoGEO$_\text{Mini}$, a cost-effective GEO model. Experiments on the standard GEO-Bench and two newly constructed benchmarks using real user queries demonstrate the effectiveness of AutoGEO in enhancing content traction while preserving search utility. Analyses confirm the learned rules' robustness and abilities to capture unique preferences in variant domains, and AutoGEO systems' ability to embed them in content optimization. The code is released at https://github.com/cxcscmu/AutoGEO.
Related papers
- E-GEO: A Testbed for Generative Engine Optimization in E-Commerce [9.66101096555058]
generative engine optimization improves content visibility and relevance for generative engines.<n>Current GEO practices are ad hoc, and their impacts remain poorly understood.<n>E-GEO is the first benchmark built specifically for e-commerce GEO.
arXiv Detail & Related papers (2025-11-25T21:28:40Z) - Caption Injection for Optimization in Generative Search Engine [15.472540238931202]
Generative Search Engines (GSEs) leverage Retrieval-Augmented Generation (RAG) techniques and Large Language Models (LLMs)<n>We propose Caption Injection, the first multimodal G-SEO approach, which extracts captions from images and injects them into textual content.<n> Experimental results show that Caption Injection significantly outperforms text-only G-SEO baselines under the G-Eval metric.
arXiv Detail & Related papers (2025-11-06T05:37:27Z) - AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding [73.60257070465377]
AdaVideoRAG is a novel framework that adapts retrieval based on query complexity using a lightweight intent classifier.<n>Our framework employs an Omni-Knowledge Indexing module to build hierarchical databases from text (captions, ASR, OCR), visual features, and semantic graphs.<n> Experiments demonstrate improved efficiency and accuracy for long-video understanding, with seamless integration into existing MLLMs.
arXiv Detail & Related papers (2025-06-16T15:18:15Z) - ImpRAG: Retrieval-Augmented Generation with Implicit Queries [34.72864597562907]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z) - AutoGEEval: A Multimodal and Automated Framework for Geospatial Code Generation on GEE with Large Language Models [2.115331311872418]
AutoGEEval is an evaluation framework for code generation tasks on the Google Earth Engine (GEE) platform powered by large language models (LLMs)<n>Built upon the GEE Python API, AutoGEEval establishes a benchmark suite (AutoGEEval-Bench) comprising 1325 test cases that span 26 GEE data types.<n>We evaluate 18 state-of-the-art LLMs-including general-purpose, reasoning-augmented, code-centric, and geoscience-specialized models-revealing their performance characteristics and potential optimization pathways in GEE code generation.
arXiv Detail & Related papers (2025-05-19T09:35:58Z) - VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents [66.42579289213941]
Retrieval-augmented generation (RAG) is an effective technique that enables large language models to utilize external knowledge sources for generation.<n>We introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline.<n>In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
arXiv Detail & Related papers (2024-10-14T15:04:18Z) - GEM-RAG: Graphical Eigen Memories For Retrieval Augmented Generation [3.2027710059627545]
We introduce Graphical Eigen Memories For Retrieval Augmented Generation (GEM-RAG)
GEM-RAG works by tagging each chunk of text in a given text corpus with LLM generated utility'' questions.
We evaluate GEM-RAG, using both UnifiedQA and GPT-3.5 Turbo as the LLMs, with SBERT, and OpenAI's text encoders on two standard QA tasks.
arXiv Detail & Related papers (2024-09-23T21:42:47Z) - GEO: Generative Engine Optimization [50.45232692363787]
We formalize the unified framework of generative engines (GEs)
GEs use large language models (LLMs) to gather and summarize information to answer user queries.
Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them.
We introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in generative engine responses.
arXiv Detail & Related papers (2023-11-16T10:06:09Z) - Active Retrieval Augmented Generation [123.68874416084499]
Augmenting large language models (LMs) by retrieving information from external knowledge resources is one promising solution.
Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input.
We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content.
arXiv Detail & Related papers (2023-05-11T17:13:40Z) - LasUIE: Unifying Information Extraction with Latent Adaptive
Structure-aware Generative Language Model [96.889634747943]
Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential.
We propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE.
Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system.
arXiv Detail & Related papers (2023-04-13T04:01:14Z) - VEGA: Towards an End-to-End Configurable AutoML Pipeline [101.07003005736719]
VEGA is an efficient and comprehensive AutoML framework that is compatible and optimized for multiple hardware platforms.
VEGA can improve the existing AutoML algorithms and discover new high-performance models against SOTA methods.
arXiv Detail & Related papers (2020-11-03T06:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.