Automatic Section Recognition in Obituaries
- URL: http://arxiv.org/abs/2002.12699v1
- Date: Fri, 28 Feb 2020 13:20:09 GMT
- Title: Automatic Section Recognition in Obituaries
- Authors: Valentino Sabbatino and Laura Bostan and Roman Klinger
- Abstract summary: We propose a statistical model which recognizes sections of obituaries.
We collect a corpus of 20058 English obituaries from TheDaily Item, Remembering.CA and The London Free Press.
Formulated as an automatic segmentation task, a convolutional neural network outperforms bag-of-words.
- Score: 10.536415845097661
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Obituaries contain information about people's values across times and
cultures, which makes them a useful resource for exploring cultural history.
They are typically structured similarly, with sections corresponding to
Personal Information, Biographical Sketch, Characteristics, Family, Gratitude,
Tribute, Funeral Information and Other aspects of the person. To make this
information available for further studies, we propose a statistical model which
recognizes these sections. To achieve that, we collect a corpus of 20058
English obituaries from TheDaily Item, Remembering.CA and The London Free
Press. The evaluation of our annotation guidelines with three annotators on
1008 obituaries shows a substantial agreement of Fleiss k = 0.87. Formulated as
an automatic segmentation task, a convolutional neural network outperforms
bag-of-words and embedding-based BiLSTMs and BiLSTM-CRFs with a micro F1 =
0.81.
Related papers
- Leveraging deep active learning to identify low-resource mobility
functioning information in public clinical notes [0.157286095422595]
First public annotated dataset specifically on the Mobility domain of the International Classification of Functioning, Disability and Health (ICF)
We utilize the National NLP Clinical Challenges (n2c2) research dataset to construct a pool of candidate sentences using keyword expansion.
Our final dataset consists of 4,265 sentences with a total of 11,784 entities, including 5,511 Action entities, 5,328 Mobility entities, 306 Assistance entities, and 639 Quantification entities.
arXiv Detail & Related papers (2023-11-27T15:53:11Z) - HuBERTopic: Enhancing Semantic Representation of HuBERT through
Self-supervision Utilizing Topic Model [62.995175485416]
We propose a new approach to enrich the semantic representation of HuBERT.
An auxiliary topic classification task is added to HuBERT by using topic labels as teachers.
Experimental results demonstrate that our method achieves comparable or better performance than the baseline in most tasks.
arXiv Detail & Related papers (2023-10-06T02:19:09Z) - Zero-shot Learning with Minimum Instruction to Extract Social
Determinants and Family History from Clinical Notes using GPT Model [4.72294159722118]
This research focuses on investigating the zero-shot learning on extracting this information together.
We utilize de-identified real-world clinical notes annotated for demographics, various social determinants, and family history information.
Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction.
arXiv Detail & Related papers (2023-09-11T14:16:27Z) - Towards Unifying Anatomy Segmentation: Automated Generation of a
Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines [113.08940153125616]
We generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage.
Our proposed procedure does not rely on manual annotation during the label aggregation stage.
We release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
arXiv Detail & Related papers (2023-07-25T09:48:13Z) - Wikibio: a Semantic Resource for the Intersectional Analysis of
Biographical Events [3.8455936323976694]
We present a new corpus annotated for biographical event detection.
The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808.
It was also used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.
arXiv Detail & Related papers (2023-06-15T20:59:37Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - SegViz: A Federated Learning Framework for Medical Image Segmentation
from Distributed Datasets with Different and Incomplete Annotations [3.6704226968275258]
We developed SegViz, a learning framework for aggregating knowledge from distributed medical image segmentation datasets.
SegViz was trained to build a model capable of segmenting both liver and spleen aggregating knowledge from both these nodes.
Our results demonstrate SegViz as an essential first step towards training clinically translatable multi-task segmentation models.
arXiv Detail & Related papers (2023-01-17T18:36:57Z) - Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization [80.94424037751243]
In zero-shot multilingual extractive text summarization, a model is typically trained on English dataset and then applied on summarization datasets of other languages.
We propose NLS (Neural Label Search for Summarization), which jointly learns hierarchical weights for different sets of labels together with our summarization model.
We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations.
arXiv Detail & Related papers (2022-04-28T14:02:16Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - MobIE: A German Dataset for Named Entity Recognition, Entity Linking and
Relation Extraction in the Mobility Domain [76.21775236904185]
dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities.
A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types.
To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE.
arXiv Detail & Related papers (2021-08-16T08:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.