NLP-based Relation Extraction Methods in RE
- URL: http://arxiv.org/abs/2401.12075v1
- Date: Mon, 22 Jan 2024 16:14:27 GMT
- Title: NLP-based Relation Extraction Methods in RE
- Authors: Quim Motger, Xavier Franch
- Abstract summary: Mobile app repositories have been largely used in scientific research as large-scale, highly adaptive crowdsourced information systems.
We present MApp-KG, a combination of software resources and data artefacts to support extended knowledge generation tasks.
Our contribution aims to provide a framework for automatically constructing a knowledge graph modelling a domain-specific catalogue of mobile apps.
- Score: 4.856095570023289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile app repositories have been largely used in scientific research as
large-scale, highly adaptive crowdsourced information systems. These software
platforms can potentially nourish multiple software and requirements
engineering tasks based on user reviews and other natural language documents,
including feedback analysis, recommender systems and topic modelling.
Consequently, researchers often endeavour to overcome domain-specific
challenges, including integration of heterogeneous data sources, large-scale
data collection and adaptation of a publicly available data set for a given
research scenario. In this paper, we present MApp-KG, a combination of software
resources and data artefacts in the field of mobile app repositories to support
extended knowledge generation tasks. Our contribution aims to provide a
framework for automatically constructing a knowledge graph modelling a
domain-specific catalogue of mobile apps. Complementarily, we distribute
MApp-KG in a public triplestore and as a static data snapshot, which may be
promptly employed for future research and reproduction of our findings.
Related papers
- Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Synthetic Data Generation with Large Language Models for Personalized Community Question Answering [47.300506002171275]
We build Sy-SE-PQA based on an existing dataset, SE-PQA, which consists of questions and answers posted on the popular StackExchange communities.
Our findings suggest that LLMs have high potential in generating data tailored to users' needs.
The synthetic data can replace human-written training data, even if the generated data may contain incorrect information.
arXiv Detail & Related papers (2024-10-29T16:19:08Z) - GraphAide: Advanced Graph-Assisted Query and Reasoning System [0.04999814847776096]
We introduce an advanced query and reasoning system, GraphAide, which constructs a knowledge graph (KG) from diverse sources and allows to query and reason over the resulting KG.
GraphAide harnesses Large Language Models (LLMs) to rapidly develop domain-specific digital assistants.
arXiv Detail & Related papers (2024-10-29T07:25:30Z) - On the Creation of Representative Samples of Software Repositories [1.8599311233727087]
With the emergence of social coding platforms such as GitHub, researchers have now access to millions of software repositories to use as source data for their studies.
Current sampling methods are often based on random selection or rely on variables which may not be related to the research study.
We present a methodology for creating representative samples of software repositories, where such representativeness is properly aligned with both the characteristics of the population of repositories and the requirements of the empirical study.
arXiv Detail & Related papers (2024-10-01T12:41:15Z) - On-Device Language Models: A Comprehensive Review [26.759861320845467]
Review examines the challenges of deploying computationally expensive large language models on resource-constrained devices.
Paper investigates on-device language models, their efficient architectures, as well as state-of-the-art compression techniques.
Case studies of on-device language models from major mobile manufacturers demonstrate real-world applications and potential benefits.
arXiv Detail & Related papers (2024-08-26T03:33:36Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.