Related papers: Paths of A Million People: Extracting Life Trajectories from Wikipedia

Paths of A Million People: Extracting Life Trajectories from Wikipedia

URL: http://arxiv.org/abs/2406.00032v2
Date: Sun, 21 Jul 2024 06:52:40 GMT
Title: Paths of A Million People: Extracting Life Trajectories from Wikipedia
Authors: Ying Zhang, Xiaofeng Li, Zhaoyang Liu, Haipeng Zhang,
Abstract summary: We tackle the generalization problem stemming from the variety and heterogeneity of the trajectory descriptions. Our ensemble model COSMOS, which combines the idea of semi-supervised learning and contrastive learning, achieves an F1 score of 85.95%. We also create a hand-curated dataset, WikiLifeTrajectory, consisting of 8,852 (person, time, location) triplets as ground truth.
Score: 20.02210503453678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The life trajectories of notable people have been studied to pinpoint the times and places of significant events such as birth, death, education, marriage, competition, work, speeches, scientific discoveries, artistic achievements, and battles. Understanding how these individuals interact with others provides valuable insights for broader research into human dynamics. However, the scarcity of trajectory data in terms of volume, density, and inter-person interactions, limits relevant studies from being comprehensive and interactive. We mine millions of biography pages from Wikipedia and tackle the generalization problem stemming from the variety and heterogeneity of the trajectory descriptions. Our ensemble model COSMOS, which combines the idea of semi-supervised learning and contrastive learning, achieves an F1 score of 85.95%. For this task, we also create a hand-curated dataset, WikiLifeTrajectory, consisting of 8,852 (person, time, location) triplets as ground truth. Besides, we perform an empirical analysis on the trajectories of 8,272 historians to demonstrate the validity of the extracted results. To facilitate the research on trajectory extractions and help the analytical studies to construct grand narratives, we make our code, the million-level extracted trajectories, and the WikiLifeTrajectory dataset publicly available.

Related papers

Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text. These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z)
SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events [13.894639630989563]
We argue that such knowledge can be elicited through a participant achievement lens. We analyze a complex event in a narrative according to the intended achievements of the participants. We show that smaller models fine-tuned on our dataset can achieve performance surpassing larger models.
arXiv Detail & Related papers (2024-08-11T14:52:40Z)
MobilityDL: A Review of Deep Learning From Trajectory Data [0.8999666725996975]
Trajectory data combines the complexities of time series, spatial data, and (sometimes irrational) movement behavior. This review paper provides the first comprehensive overview of deep learning approaches for trajectory data.
arXiv Detail & Related papers (2024-02-01T16:30:00Z)
Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space. We show that MTL can be successful with classification tasks with little, or non-overlapping annotations. We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z)
Telling Stories for Common Sense Zero-Shot Action Recognition [11.166901260737786]
We introduce a novel dataset, Stories, which contains rich textual descriptions for diverse action classes extracted from WikiHow articles. For each class, we extract multi-sentence narratives detailing the necessary steps, scenes, objects, and verbs that characterize the action. This contextual data enables modeling of nuanced relationships between actions, paving the way for zero-shot transfer.
arXiv Detail & Related papers (2023-09-29T15:34:39Z)
Wikibio: a Semantic Resource for the Intersectional Analysis of Biographical Events [3.8455936323976694]
We present a new corpus annotated for biographical event detection. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808. It was also used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.
arXiv Detail & Related papers (2023-06-15T20:59:37Z)
A dataset of mentorship in science with semantic and demographic estimations [4.317131795436002]
We describe a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists across 112 fields. We enrich the scientists' profiles with publication data from the Microsoft Academic Graph and "semantic" representations of research using deep learning content analysis. We perform extensive validations of the profile--publication matching, semantic content, and demographic inferences.
arXiv Detail & Related papers (2021-06-11T16:12:15Z)
Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm. We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems. We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z)
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states. BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions. We propose two knowledge-based data-driven methods to effectively capture these social interactions. We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
Human in Events: A Large-Scale Benchmark for Human-centric Video Analysis in Complex Events [106.19047816743988]
We present a new large-scale dataset with comprehensive annotations, named Human-in-Events or HiEve. It contains a record number of poses (>1M), the largest number of action instances (>56k) under complex events, as well as one of the largest numbers of trajectories lasting for longer time. Based on its diverse annotation, we present two simple baselines for action recognition and pose estimation.
arXiv Detail & Related papers (2020-05-09T18:24:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.