ShaRE your Data! Characterizing Datasets for LLM-based Requirements Engineering
- URL: http://arxiv.org/abs/2510.18787v1
- Date: Tue, 21 Oct 2025 16:40:26 GMT
- Title: ShaRE your Data! Characterizing Datasets for LLM-based Requirements Engineering
- Authors: Quim Motger, Carlota Catot, Xavier Franch,
- Abstract summary: Large Language Models (LLMs) rely on domain-specific datasets to achieve robust performance across training and inference stages.<n>In Requirements Engineering (RE), data scarcity remains a persistent limitation reported in surveys and mapping studies.<n>This research addresses the limited visibility and characterization of datasets used in LLM4RE.
- Score: 2.1774928300623615
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: [Context] Large Language Models (LLMs) rely on domain-specific datasets to achieve robust performance across training and inference stages. However, in Requirements Engineering (RE), data scarcity remains a persistent limitation reported in surveys and mapping studies. [Question/Problem] Although there are multiple datasets supporting LLM-based RE tasks (LLM4RE), they are fragmented and poorly characterized, limiting reuse and comparability. This research addresses the limited visibility and characterization of datasets used in LLM4RE. We investigate which public datasets are employed, how they can be systematically characterized, and which RE tasks and dataset descriptors remain under-represented. [Ideas/Results] To address this, we conduct a systematic mapping study to identify and analyse datasets used in LLM4RE research. A total of 62 publicly available datasets are referenced across 43 primary studies. Each dataset is characterized along descriptors such as artifact type, granularity, RE stage, task, domain, and language. Preliminary findings show multiple research gaps, including limited coverage for elicitation tasks, scarce datasets for management activities beyond traceability, and limited multilingual availability. [Contribution] This research preview offers a public catalogue and structured characterization scheme to support dataset selection, comparison, and reuse in LLM4RE research. Future work will extend the scope to grey literature, as well as integration with open dataset and benchmark repositories.
Related papers
- Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z) - More Parameters Than Populations: A Systematic Literature Review of Large Language Models within Survey Research [0.7699714865575188]
Large Language Models (LLMs) bring new technological challenges and prerequisites in order to fully harness their potential.<n>We report work-in-progress on a systematic literature review based on keyword searches from multiple large-scale databases.<n>We discuss selected examples of potential use cases for LLMs as well as its pitfalls based on examples from existing literature.
arXiv Detail & Related papers (2025-09-03T15:15:31Z) - Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [80.88654868264645]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z) - SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs [3.8439345751986913]
Large language models (LLMs) can be aligned with diverse human perspectives.<n> evaluating this alignment on downstream tasks (e.g., hate speech detection) remains challenging due to the use of inconsistent datasets across studies.<n>We introduce SubData, an open-source Python library designed for standardizing heterogeneous datasets to evaluate LLMs perspective alignment.
arXiv Detail & Related papers (2024-12-21T21:40:31Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction [67.86012624533461]
ImplicitAVE is the first, publicly available multimodal dataset for implicit attribute value extraction.
dataset includes 68k training and 1.6k testing data across five domains.
We also explore the application of multimodal large language models (MLLMs) to implicit AVE.
arXiv Detail & Related papers (2024-04-24T01:54:40Z) - Elephants Never Forget: Testing Language Models for Memorization of
Tabular Data [21.912611415307644]
Large Language Models (LLMs) can be applied to a diverse set of tasks, but the critical issues of data contamination and memorization are often glossed over.
We introduce a variety of different techniques to assess the degrees of contamination, including statistical tests for conditional distribution modeling and four tests that identify memorization.
arXiv Detail & Related papers (2024-03-11T12:07:13Z) - Datasets for Large Language Models: A Comprehensive Survey [37.153302283062004]
The survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives.
The survey sheds light on the prevailing challenges and points out potential avenues for future investigation.
The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets.
arXiv Detail & Related papers (2024-02-28T04:35:51Z) - Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis.<n>It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Large Language Models for Software Engineering: A Systematic Literature Review [34.12458948051519]
Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE)
We select and analyze 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs)
From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study.
arXiv Detail & Related papers (2023-08-21T10:37:49Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.