Cross-Lingual Query-Based Summarization of Crisis-Related Social Media:
An Abstractive Approach Using Transformers
- URL: http://arxiv.org/abs/2204.10230v1
- Date: Thu, 21 Apr 2022 16:07:52 GMT
- Title: Cross-Lingual Query-Based Summarization of Crisis-Related Social Media:
An Abstractive Approach Using Transformers
- Authors: Fedor Vitiugin and Carlos Castillo
- Abstract summary: This work proposes a cross-lingual method for retrieving and summarizing crisis-relevant information from social media postings.
We describe a uniform way of expressing various information needs through structured queries and a way of creating summaries.
- Score: 3.042890194004583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relevant and timely information collected from social media during crises can
be an invaluable resource for emergency management. However, extracting this
information remains a challenging task, particularly when dealing with social
media postings in multiple languages. This work proposes a cross-lingual method
for retrieving and summarizing crisis-relevant information from social media
postings. We describe a uniform way of expressing various information needs
through structured queries and a way of creating summaries answering those
information needs. The method is based on multilingual transformers embeddings.
Queries are written in one of the languages supported by the embeddings, and
the extracted sentences can be in any of the other languages supported.
Abstractive summaries are created by transformers. The evaluation, done by
crowdsourcing evaluators and emergency management experts, and carried out on
collections extracted from Twitter during five large-scale disasters spanning
ten languages, shows the flexibility of our approach. The generated summaries
are regarded as more focused, structured, and coherent than existing
state-of-the-art methods, and experts compare them favorably against summaries
created by existing, state-of-the-art methods.
Related papers
- Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion [0.0]
The paper proposes a novel approach to summarization that tackles such challenges by utilizing the strength of multiple sources.
The research progresses beyond conventional, unimodal sources such as text documents and integrates a more diverse range of data, including YouTube playlists, pre-prints, and Wikipedia pages.
arXiv Detail & Related papers (2024-06-19T17:15:47Z) - CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media [5.384787836425144]
In times of crisis, social media platforms play a crucial role in facilitating communication and coordinating resources.
We propose CReMa (Crisis Response Matcher), a systematic approach that integrates textual, temporal, and spatial features.
We introduce a novel multi-lingual dataset simulating help-seeking and offering assistance on social media in 16 languages.
arXiv Detail & Related papers (2024-05-20T09:30:03Z) - Multi-Query Focused Disaster Summarization via Instruction-Based
Prompting [3.6199702611839792]
CrisisFACTS aims to advance disaster summarization based on multi-stream fact-finding.
Here, participants are asked to develop systems that can extract key facts from several disaster-related events.
This paper describes our method to tackle this challenging task.
arXiv Detail & Related papers (2024-02-14T08:22:58Z) - $\mu$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge [72.64847925450368]
Cross-lingual summarization consists of generating a summary in one language given an input document in a different language.
This work presents $mu$PLAN, an approach to cross-lingual summarization that uses an intermediate planning step as a cross-lingual bridge.
arXiv Detail & Related papers (2023-05-23T16:25:21Z) - Automated Audio Captioning: an Overview of Recent Progress and New
Challenges [56.98522404673527]
Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips.
We present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.
arXiv Detail & Related papers (2022-05-12T08:36:35Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Evaluation of Abstractive Summarisation Models with Machine Translation
in Deliberative Processes [23.249742737907905]
This dataset reflects difficulties of combining multiple narratives, mostly of poor grammatical quality, in a single text.
We report an extensive evaluation of a wide range of abstractive summarisation models in combination with an off-the-shelf machine translation model.
We obtain promising results regarding the fluency, consistency and relevance of the summaries produced.
arXiv Detail & Related papers (2021-10-12T09:23:57Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive
Summarization [41.578594261746055]
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems.
We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors.
We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article.
arXiv Detail & Related papers (2020-10-07T00:28:05Z) - Language Guided Networks for Cross-modal Moment Retrieval [66.49445903955777]
Cross-modal moment retrieval aims to localize a temporal segment from an untrimmed video described by a natural language query.
Existing methods independently extract the features of videos and sentences.
We present Language Guided Networks (LGN), a new framework that leverages the sentence embedding to guide the whole process of moment retrieval.
arXiv Detail & Related papers (2020-06-18T12:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.