A German Corpus for Fine-Grained Named Entity Recognition and Relation
Extraction of Traffic and Industry Events
- URL: http://arxiv.org/abs/2004.03283v1
- Date: Tue, 7 Apr 2020 11:39:50 GMT
- Title: A German Corpus for Fine-Grained Named Entity Recognition and Relation
Extraction of Traffic and Industry Events
- Authors: Martin Schiersch, Veselina Mironova, Maximilian Schmitt, Philippe
Thomas, Aleksandra Gabryszak, Leonhard Hennig
- Abstract summary: This work describes a corpus of German-language documents which has been annotated with fine-grained geo-entities.
It has also been annotated with a set of 15 traffic- and industry-related n-ary relations and events.
The corpus consists of newswire texts, Twitter messages, and traffic reports from radio stations, police and railway companies.
- Score: 63.08899104652265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monitoring mobility- and industry-relevant events is important in areas such
as personal travel planning and supply chain management, but extracting events
pertaining to specific companies, transit routes and locations from
heterogeneous, high-volume text streams remains a significant challenge. This
work describes a corpus of German-language documents which has been annotated
with fine-grained geo-entities, such as streets, stops and routes, as well as
standard named entity types. It has also been annotated with a set of 15
traffic- and industry-related n-ary relations and events, such as accidents,
traffic jams, acquisitions, and strikes. The corpus consists of newswire texts,
Twitter messages, and traffic reports from radio stations, police and railway
companies. It allows for training and evaluating both named entity recognition
algorithms that aim for fine-grained typing of geo-entities, as well as n-ary
relation extraction systems.
Related papers
- XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More [38.092415845567345]
Research has been conducted on two highly correlated tracks: traffic and incidents.
XTraffic dataset includes traffic, i.e., time-series indexes on traffic flow, lane occupancy, and average vehicle speed.
Each node includes detailed physical policy-level meta-attributes of lanes.
arXiv Detail & Related papers (2024-07-16T08:16:01Z) - BjTT: A Large-scale Multimodal Dataset for Traffic Prediction [49.93028461584377]
Traditional traffic prediction methods rely on historical traffic data to predict traffic trends.
In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation.
We propose ChatTraffic, the first diffusion model for text-to-traffic generation.
arXiv Detail & Related papers (2024-03-08T04:19:56Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - Studying the role of named entities for content preservation in text
style transfer [65.40394342240558]
We focus on the role of named entities in content preservation for formality text style transfer.
We collect a new dataset for the evaluation of content similarity measures in text style transfer.
We perform an error analysis of a pre-trained formality transfer model and introduce a simple technique to use information about named entities to enhance the performance of baseline content similarity measures used in text style transfer.
arXiv Detail & Related papers (2022-06-20T09:31:47Z) - Unsupervised Key Event Detection from Massive Text Corpora [42.31889135421941]
We propose a new task, key event detection at the intermediate level, aiming to detect from a news corpus key events.
This task can bridge event understanding and structuring and is inherently challenging because of the thematic and temporal closeness of key events.
We develop an unsupervised key event detection framework, EvMine, that extracts temporally frequent peak phrases using a novel ttf-itf score.
arXiv Detail & Related papers (2022-06-08T20:31:02Z) - Causal BERT : Language models for causality detection between events
expressed in text [1.0756038762528868]
Causality understanding between events is helpful in many areas, including health care, business risk management and finance.
"Cause-Effect" relationships between natural language events continues to remain a challenge simply because it is often expressed implicitly.
Our proposed methods achieve the state-of-art performance in three different data distributions and can be leveraged for extraction of a causal diagram.
arXiv Detail & Related papers (2020-12-10T04:59:12Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - Complex networks for event detection in heterogeneous high volume news
streams [0.0]
The volume and rate of online news increases the need for automated event detection methods thatcan operate in real time.
We develop a network-based approach that makes the workingassumption that important news events always involve named entities that are linked in news articles.
arXiv Detail & Related papers (2020-05-28T02:45:43Z) - Detecting Unsigned Physical Road Incidents from Driver-View Images [3.840106920708639]
A critical need is to detect and communicate disruptive incidents early and effectively.
We propose a system based on an off-the-shelf deep neural network architecture.
We develop a taxonomy for unsigned physical incidents to provide a means of organizing and grouping related incidents.
arXiv Detail & Related papers (2020-04-24T16:02:17Z) - A Corpus Study and Annotation Schema for Named Entity Recognition and
Relation Extraction of Business Products [68.26059718611914]
We present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions.
We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity.
We present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
arXiv Detail & Related papers (2020-04-07T11:45:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.