Automated Requirements Relation Extraction
- URL: http://arxiv.org/abs/2401.12075v2
- Date: Thu, 20 Mar 2025 15:30:25 GMT
- Title: Automated Requirements Relation Extraction
- Authors: Quim Motger, Xavier Franch,
- Abstract summary: This chapter aims at providing a clear perspective on the theoretical and practical fundamentals in the field of natural language-based relation extraction.<n>We first describe the fundamentals of requirements relations based on the most relevant literature in the field, including the most common requirements relations types.<n>The core of the chapter is composed by two main sections: (i) natural language techniques for the identification and categorization of equirements relations (i.e., syntactic vs. semantic techniques) and (ii) information extraction methods for the task of relation extraction.
- Score: 4.110571395660999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of requirements engineering, relation extraction involves identifying and documenting the associations between different requirements artefacts. When dealing with textual requirements (i.e., requirements expressed using natural language), relation extraction becomes a cognitively challenging task, especially in terms of ambiguity and required effort from domain-experts. Hence, in highly-adaptive, large-scale environments, effective and efficient automated relation extraction using natural language processing techniques becomes essential. In this chapter, we present a comprehensive overview of natural language-based relation extraction from text-based requirements. We initially describe the fundamentals of requirements relations based on the most relevant literature in the field, including the most common requirements relations types. The core of the chapter is composed by two main sections: (i) natural language techniques for the identification and categorization of equirements relations (i.e., syntactic vs. semantic techniques), and (ii) information extraction methods for the task of relation extraction (i.e., retrieval-based vs. machine learning-based methods). We complement this analysis with the state-of-the-art challenges and the envisioned future research directions. Overall, this chapter aims at providing a clear perspective on the theoretical and practical fundamentals in the field of natural language-based relation extraction.
Related papers
- Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Synthetic Data Generation with Large Language Models for Personalized Community Question Answering [47.300506002171275]
We build Sy-SE-PQA based on an existing dataset, SE-PQA, which consists of questions and answers posted on the popular StackExchange communities.
Our findings suggest that LLMs have high potential in generating data tailored to users' needs.
The synthetic data can replace human-written training data, even if the generated data may contain incorrect information.
arXiv Detail & Related papers (2024-10-29T16:19:08Z) - GraphAide: Advanced Graph-Assisted Query and Reasoning System [0.04999814847776096]
We introduce an advanced query and reasoning system, GraphAide, which constructs a knowledge graph (KG) from diverse sources and allows to query and reason over the resulting KG.
GraphAide harnesses Large Language Models (LLMs) to rapidly develop domain-specific digital assistants.
arXiv Detail & Related papers (2024-10-29T07:25:30Z) - On the Creation of Representative Samples of Software Repositories [1.8599311233727087]
With the emergence of social coding platforms such as GitHub, researchers have now access to millions of software repositories to use as source data for their studies.
Current sampling methods are often based on random selection or rely on variables which may not be related to the research study.
We present a methodology for creating representative samples of software repositories, where such representativeness is properly aligned with both the characteristics of the population of repositories and the requirements of the empirical study.
arXiv Detail & Related papers (2024-10-01T12:41:15Z) - On-Device Language Models: A Comprehensive Review [26.759861320845467]
Review examines the challenges of deploying computationally expensive large language models on resource-constrained devices.
Paper investigates on-device language models, their efficient architectures, as well as state-of-the-art compression techniques.
Case studies of on-device language models from major mobile manufacturers demonstrate real-world applications and potential benefits.
arXiv Detail & Related papers (2024-08-26T03:33:36Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - PromptRE: Weakly-Supervised Document-Level Relation Extraction via
Prompting-Based Data Programming [30.597623178206874]
We propose PromptRE, a novel weakly-supervised document-level relation extraction method.
PromptRE incorporates the label distribution and entity types as prior knowledge to improve the performance.
Experimental results on ReDocRED, a benchmark dataset for document-level relation extraction, demonstrate the superiority of PromptRE over baseline approaches.
arXiv Detail & Related papers (2023-10-13T17:23:17Z) - A Comprehensive Survey of Document-level Relation Extraction (2016-2023) [3.0204640945657326]
Document-level relation extraction (DocRE) is an active area of research in natural language processing (NLP)
This paper aims to provide a comprehensive overview of recent advances in this field, highlighting its different applications in comparison to sentence-level relation extraction.
arXiv Detail & Related papers (2023-09-28T12:43:32Z) - A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - TAGPRIME: A Unified Framework for Relational Structure Extraction [71.88926365652034]
TAGPRIME is a sequence tagging model that appends priming words about the information of the given condition to the input text.
With the self-attention mechanism in pre-trained language models, the priming words make the output contextualized representations contain more information about the given condition.
Extensive experiments and analyses on three different tasks that cover ten datasets across five different languages demonstrate the generality and effectiveness of TAGPRIME.
arXiv Detail & Related papers (2022-05-25T08:57:46Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - Modular Self-Supervision for Document-Level Relation Extraction [17.039775384229355]
We propose decomposing document-level relation extraction into relation detection and argument resolution.
We conduct a thorough evaluation in biomedical machine reading for precision oncology, where cross-paragraph relation mentions are prevalent.
Our method outperforms prior state of the art, such as multi-scale learning and graph neural networks, by over 20 absolute F1 points.
arXiv Detail & Related papers (2021-09-11T20:09:18Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Leveraging Semantic Parsing for Relation Linking over Knowledge Bases [80.99588366232075]
We present SLING, a relation linking framework which leverages semantic parsing using AMR and distant supervision.
SLING integrates multiple relation linking approaches that capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledgebase.
experiments on relation linking using three KBQA datasets; QALD-7, QALD-9, and LC-QuAD 1.0 demonstrate that the proposed approach achieves state-of-the-art performance on all benchmarks.
arXiv Detail & Related papers (2020-09-16T14:56:11Z) - Natural language technology and query expansion: issues,
state-of-the-art and perspectives [0.0]
Linguistic characteristics that cause ambiguity and misinterpretation of queries as well as additional factors affect the users ability to accurately represent their information needs.
We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition.
For each of the modules we review the state-of-the-art solutions in the literature and categorized under the light of the techniques used.
arXiv Detail & Related papers (2020-04-23T11:39:07Z) - Coupled intrinsic and extrinsic human language resource-based query
expansion [0.0]
We present here a query expansion framework which capitalizes on both linguistic characteristics for query constituent encoding, expansion concept extraction and concept weighting.
A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.
arXiv Detail & Related papers (2020-04-23T11:22:38Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.