I still have Time(s): Extending HeidelTime for German Texts
- URL: http://arxiv.org/abs/2204.08848v1
- Date: Tue, 19 Apr 2022 12:25:47 GMT
- Title: I still have Time(s): Extending HeidelTime for German Texts
- Authors: Andy L\"ucking, Manuel Stoeckel, Giuseppe Abrami, Alexander Mehler
- Abstract summary: HeidelTime is a tool for detecting temporal expressions in texts.
HeidelTime-EXT can be used to observe false negatives in texts.
- Score: 63.22865852794608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: HeidelTime is one of the most widespread and successful tools for detecting
temporal expressions in texts. Since HeidelTime's pattern matching system is
based on regular expression, it can be extended in a convenient way. We present
such an extension for the German resources of HeidelTime: HeidelTime-EXT . The
extension has been brought about by means of observing false negatives within
real world texts and various time banks. The gain in coverage is 2.7% or 8.5%,
depending on the admitted degree of potential overgeneralization. We describe
the development of HeidelTime-EXT, its evaluation on text samples from various
genres, and share some linguistic observations. HeidelTime ext can be obtained
from https://github.com/texttechnologylab/heideltime.
Related papers
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models [89.28591263741973]
We introduce the Hierarchical Long Text Generation Benchmark (HelloBench) to evaluate Large Language Models' performance in generating long text.
Based on Bloom's taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and text generation.
Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human evaluation method that significantly reduces the time and effort required for human evaluation.
arXiv Detail & Related papers (2024-09-24T15:38:11Z) - TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification [2.868883216530741]
We introduce the TEI2GO models, matching HeidelTime's effectiveness but with significantly improved runtime.
To train the TEI2GO models, we used a combination of manually annotated reference corpus and developed Professor HeidelTime'', a comprehensive weakly labeled corpus of news texts annotated with HeidelTime.
Code, annotations, and models are openly available for community exploration and use.
arXiv Detail & Related papers (2024-03-25T14:23:03Z) - MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - REST: Retrieval-Based Speculative Decoding [69.06115086237207]
We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.
Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens.
When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation.
arXiv Detail & Related papers (2023-11-14T15:43:47Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - Time-aware Prompting for Text Generation [17.58231642569116]
We study the effects of incorporating timestamps, such as document creation dates, into generation systems.
Two types of time-aware prompts are investigated: (1) textual prompts that encode document timestamps in natural language sentences; and (2) linear prompts that convert timestamps into continuous vectors.
arXiv Detail & Related papers (2022-11-03T22:10:25Z) - XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal
Expression Extraction [63.39190486298887]
Temporal Expression Extraction (TEE) is essential for understanding time in natural language.
To date, work in this area has mostly focused on English as there is a scarcity of labeled data for other languages.
We propose XLTime, a novel framework for multilingual TEE.
arXiv Detail & Related papers (2022-05-03T20:00:42Z) - Language modeling via stochastic processes [30.796382023812022]
Modern language models can generate high-quality short texts, but often meander or are incoherent when generating longer texts.
Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning.
We propose one approach for leveraging constrastive representations, which we call Time Control.
arXiv Detail & Related papers (2022-03-21T22:13:53Z) - Time Masking for Temporal Language Models [23.08079115356717]
We propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts.
Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information.
arXiv Detail & Related papers (2021-10-12T21:15:23Z) - BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language
Generation [42.34923623457615]
Bias in Open-Ended Language Generation dataset consists of 23,679 English text generation prompts.
An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text.
arXiv Detail & Related papers (2021-01-27T22:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.