TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification
- URL: http://arxiv.org/abs/2403.16804v1
- Date: Mon, 25 Mar 2024 14:23:03 GMT
- Title: TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification
- Authors: Hugo Sousa, Ricardo Campos, AlĂpio Jorge,
- Abstract summary: We introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime.
To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed Professor HeidelTime'', a comprehensive weakly labeled corpus of news texts annotated with HeidelTime.
Code, annotations, and models are openly available for community exploration and use.
- Score: 2.868883216530741
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Temporal expression identification is crucial for understanding texts written in natural language. Although highly effective systems such as HeidelTime exist, their limited runtime performance hampers adoption in large-scale applications and production environments. In this paper, we introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime, supporting six languages, and achieving state-of-the-art results in four of them. To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed ``Professor HeidelTime'', a comprehensive weakly labeled corpus of news texts annotated with HeidelTime. This corpus comprises a total of $138,069$ documents (over six languages) with $1,050,921$ temporal expressions, the largest open-source annotated dataset for temporal expression identification to date. By describing how the models were produced, we aim to encourage the research community to further explore, refine, and extend the set of models to additional languages and domains. Code, annotations, and models are openly available for community exploration and use. The models are conveniently on HuggingFace for seamless integration and application.
Related papers
- Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models [24.784375155633427]
BiTimeBERT 2.0 is a novel language model pre-trained on a temporal news article collection.
Each objective targets a unique aspect of temporal information.
Results consistently demonstrate that BiTimeBERT 2.0 outperforms models like BERT and other existing pre-trained models.
arXiv Detail & Related papers (2024-06-04T00:30:37Z) - Time Machine GPT [15.661920010658626]
Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora.
This approach is not aligned with the evolving nature of language.
This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT)
arXiv Detail & Related papers (2024-04-29T09:34:25Z) - Paragraph-to-Image Generation with Information-Enriched Diffusion Model [67.9265336953134]
ParaDiffusion is an information-enriched diffusion model for paragraph-to-image generation task.
It delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.
The code and dataset will be released to foster community research on long-text alignment.
arXiv Detail & Related papers (2023-11-24T05:17:01Z) - Most Language Models can be Poets too: An AI Writing Assistant and
Constrained Text Generation Studio [0.5097809301149341]
We find that most language models generate compelling text even under significant constraints.
We present a technique for modifying the output of a language model by compositionally applying filter functions to the language models vocabulary.
We also present a Huggingface space web-app presenting this technique called Gadsby.
arXiv Detail & Related papers (2023-06-28T05:10:51Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Prompting Large Language Models to Reformulate Queries for Moment
Localization [79.57593838400618]
The task of moment localization is to localize a temporal moment in an untrimmed video for a given natural language query.
We make early attempts at reformulating the moment queries into a set of instructions using large language models and making them more friendly to the localization models.
arXiv Detail & Related papers (2023-06-06T05:48:09Z) - HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training [49.52679453475878]
We propose a Temporal-Aware video-language pre-training framework, HiTeA, for modeling cross-modal alignment between moments and texts.
We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks.
arXiv Detail & Related papers (2022-12-30T04:27:01Z) - Efficient and Interpretable Neural Models for Entity Tracking [3.1985066117432934]
This thesis focuses on two key problems in relation to facilitating the use of entity tracking models.
We argue that computationally efficient entity tracking models can be developed by representing entities with rich, fixed-dimensional vector representations.
We also argue for the integration of entity tracking into language models as it will allow for: (i) wider application given the current ubiquitous use of pretrained language models in NLP applications.
arXiv Detail & Related papers (2022-08-30T13:25:27Z) - I still have Time(s): Extending HeidelTime for German Texts [63.22865852794608]
HeidelTime is a tool for detecting temporal expressions in texts.
HeidelTime-EXT can be used to observe false negatives in texts.
arXiv Detail & Related papers (2022-04-19T12:25:47Z) - Language modeling via stochastic processes [30.796382023812022]
Modern language models can generate high-quality short texts, but often meander or are incoherent when generating longer texts.
Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning.
We propose one approach for leveraging constrastive representations, which we call Time Control.
arXiv Detail & Related papers (2022-03-21T22:13:53Z) - Language Guided Networks for Cross-modal Moment Retrieval [66.49445903955777]
Cross-modal moment retrieval aims to localize a temporal segment from an untrimmed video described by a natural language query.
Existing methods independently extract the features of videos and sentences.
We present Language Guided Networks (LGN), a new framework that leverages the sentence embedding to guide the whole process of moment retrieval.
arXiv Detail & Related papers (2020-06-18T12:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.