Related papers: Striking Gold in Advertising: Standardization and Exploration of Ad Text Generation

Related papers

AI-Generated Game Commentary: A Survey and a Datasheet Repository [4.396546075994102]
We introduce a general framework for AIGGC and present a comprehensive survey of 45 existing game commentary dataset and methods.<n>To support future research benchmarking, we also provide a structured appendix, which is meanwhile publicly available in an open repository.
arXiv Detail & Related papers (2025-06-17T07:04:51Z)
A Position Paper on the Automatic Generation of Machine Learning Leaderboards [12.736094044510224]
An important task in machine learning (ML) research is comparing prior work, which is often performed via ML leaderboards.<n>To ease this burden, researchers have developed methods to extract leaderboard entries from research papers.<n>Yet, prior work varies in problem framing, complicating comparisons and limiting real-world applicability.<n>We propose an ALG unified conceptual framework to standardise how the ALG task is defined.
arXiv Detail & Related papers (2025-05-23T04:46:10Z)
Towards Visual Text Grounding of Multimodal Large Language Model [88.0588924255417]
We introduce TRIG, a novel task with a newly designed instruction dataset for benchmarking text-rich image grounding. Specifically, we propose an OCR-LLM-human interaction pipeline to create 800 manually annotated question-answer pairs as a benchmark. A comprehensive evaluation of various MLLMs on our proposed benchmark exposes substantial limitations in their grounding capability on text-rich images.
arXiv Detail & Related papers (2025-04-07T12:01:59Z)
Movie2Story: A framework for understanding videos and telling stories in the form of novel text [0.0]
We propose a novel benchmark to evaluate text generation capabilities in scenarios enriched with auxiliary information. Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information. Our experiments reveal that current Multi-modal Large Language Models (MLLMs) perform suboptimally under the proposed evaluation metrics.
arXiv Detail & Related papers (2024-12-19T15:44:04Z)
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework [0.1979158763744267]
Open-ended text generation has become a prominent task in natural language processing. Decoding methods often excel in some metrics while underperforming in others. We present novel ranking strategies within this multicriteria framework.
arXiv Detail & Related papers (2024-10-24T11:32:01Z)
Optimizing and Evaluating Enterprise Retrieval-Augmented Generation (RAG): A Content Design Perspective [0.0]
Retrieval-augmented generation (RAG) is a popular technique for using large language models (LLMs) to build customer-support, question-answering solutions. This paper focuses on solution strategies that are modular and model-agnostic.
arXiv Detail & Related papers (2024-10-01T03:54:45Z)
AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising [19.642481233488667]
We propose AdTEC (Ad Text Evaluation Benchmark by CyberAgent), the first public benchmark to evaluate ad texts from multiple perspectives. Our contributions are as follows: (i) Defining five tasks for evaluating the quality of ad texts, as well as building a Japanese dataset based on the practical operational experiences of advertising agencies, which are typically kept in-house.
arXiv Detail & Related papers (2024-08-12T03:32:53Z)
Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks. We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement. We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z)
Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark [0.0]
We provide an overview of the advances in universal text embedding models with a focus on the top performing text embeddings on Massive Text Embedding Benchmark (MTEB) Through detailed comparison and analysis, we highlight the key contributions and limitations in this area, and propose potentially inspiring future research directions.
arXiv Detail & Related papers (2024-05-27T09:52:54Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z)
BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS) We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z)
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction [67.54420015049732]
Aspect Sentiment Triplet Extraction (ASTE) is a challenging task in sentiment analysis, aiming to provide fine-grained insights into human sentiments. Existing benchmarks are limited to two domains and do not evaluate model performance on unseen domains. We introduce a domain-expanded benchmark by annotating samples from diverse domains, enabling evaluation of models in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2023-05-23T18:01:49Z)
Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles [17.24298646089662]
This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge. We aim to improve tagging consistency and entity coverage using various methods. In the official evaluation of the challenge, our system was ranked 1st in NER by significantly outperforming the baseline model.
arXiv Detail & Related papers (2021-11-20T13:13:58Z)
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets [17.13484629172643]
We develop a framework to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. We propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.
arXiv Detail & Related papers (2021-06-16T18:20:58Z)
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics [66.96150429230035]
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models.
arXiv Detail & Related papers (2021-02-02T18:42:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.