Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility
- URL: http://arxiv.org/abs/2501.05606v1
- Date: Thu, 09 Jan 2025 22:48:43 GMT
- Title: Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility
- Authors: Zixuan Liang,
- Abstract summary: This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs)
Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal.
The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization.
- Score: 0.0
- License:
- Abstract: This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs). Leveraging linked data and RDF techniques, we integrate data from multiple sources into a unified model based on DCAT and META-SHARE OWL ontology. Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal. Real user queries from the Corpora Mailing List (CML) were evaluated to assess Linghub capability to satisfy actual user needs. Results indicate that while some limitations persist, many user requests can be successfully addressed. The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization. This initial research underscores the importance of API-based access to LRs, promoting machine usability and data subset extraction for specific purposes, paving the way for more efficient and standardized LR utilization.
Related papers
- Augmented Knowledge Graph Querying leveraging LLMs [2.5311562666866494]
We introduce SparqLLM, a framework that enhances the querying of Knowledge Graphs (KGs)
SparqLLM executes the Extract, Transform, and Load (ETL) pipeline to construct KGs from raw data.
It also features a natural language interface powered by Large Language Models (LLMs) to enable automatic SPARQL query generation.
arXiv Detail & Related papers (2025-02-03T12:18:39Z) - Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)
We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models [1.3980986259786221]
This paper examines the integration of Large Language Models (LLMs) within existing systems.
By leveraging the advanced natural language understanding capabilities of LLMs, our method improves RDF entity extraction within web systems.
The evaluation of this methodology shows a marked enhancement in system expressivity and the accuracy of responses to user queries.
arXiv Detail & Related papers (2024-09-24T16:31:33Z) - Chatbot-Based Ontology Interaction Using Large Language Models and Domain-Specific Standards [41.19948826527649]
Large Language Models (LLMs) are employed to enhance SPARQL query generation.
System converts user inquiries into accurate SPARQL queries.
Additional information from established domain-specific standards is integrated into the interface.
arXiv Detail & Related papers (2024-07-22T11:58:36Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis.
It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Utilising a Large Language Model to Annotate Subject Metadata: A Case
Study in an Australian National Research Data Catalogue [18.325675189960833]
In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research.
As the availability of datasets increases, it becomes more important to have quality metadata for discovering and reusing them.
This paper proposes to leverage large language models (LLMs) for cost-effective annotation of subject metadata through the LLM-based in-context learning.
arXiv Detail & Related papers (2023-10-17T14:52:33Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Metadata Representations for Queryable ML Model Zoos [73.24799582702326]
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models.
The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it.
In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
arXiv Detail & Related papers (2022-07-19T15:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.