DOCENT: Learning Self-Supervised Entity Representations from Large
Document Collections
- URL: http://arxiv.org/abs/2102.13247v1
- Date: Fri, 26 Feb 2021 01:00:12 GMT
- Title: DOCENT: Learning Self-Supervised Entity Representations from Large
Document Collections
- Authors: Yury Zemlyanskiy, Sudeep Gandhe, Ruining He, Bhargav Kanagal, Anirudh
Ravula, Juraj Gottweis, Fei Sha and Ilya Eckstein
- Abstract summary: This paper explores learning rich self-supervised entity representations from large amounts of associated text.
Once pre-trained, these models become applicable to multiple entity-centric tasks such as ranked retrieval, knowledge base completion, question answering, and more.
We present several training strategies that, unlike prior approaches, learn to jointly predict words and entities.
- Score: 18.62873757515885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores learning rich self-supervised entity representations from
large amounts of the associated text. Once pre-trained, these models become
applicable to multiple entity-centric tasks such as ranked retrieval, knowledge
base completion, question answering, and more. Unlike other methods that
harvest self-supervision signals based merely on a local context within a
sentence, we radically expand the notion of context to include any available
text related to an entity. This enables a new class of powerful, high-capacity
representations that can ultimately distill much of the useful information
about an entity from multiple text sources, without any human supervision.
We present several training strategies that, unlike prior approaches, learn
to jointly predict words and entities -- strategies we compare experimentally
on downstream tasks in the TV-Movies domain, such as MovieLens tag prediction
from user reviews and natural language movie search. As evidenced by results,
our models match or outperform competitive baselines, sometimes with little or
no fine-tuning, and can scale to very large corpora.
Finally, we make our datasets and pre-trained models publicly available. This
includes Reviews2Movielens (see https://goo.gle/research-docent ), mapping the
up to 1B word corpus of Amazon movie reviews (He and McAuley, 2016) to
MovieLens tags (Harper and Konstan, 2016), as well as Reddit Movie Suggestions
(see https://urikz.github.io/docent ) with natural language queries and
corresponding community recommendations.
Related papers
- Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data [19.210471935816273]
We propose a novel evaluation task for video-text understanding, namely retrieval from counterfactually augmented data (RCAD) and a new Feint6K dataset.
To succeed on our new evaluation task, models must derive a comprehensive understanding of the video from cross-frame reasoning.
Our approach successfully learn more discriminative action embeddings and improves results on Feint6K when applied to multiple video-text models.
arXiv Detail & Related papers (2024-07-18T01:55:48Z) - Learning a Grammar Inducer from Massive Uncurated Instructional Videos [118.7279072358029]
Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text.
We build a new model that can better learn video-span correlation without manually designed features.
Our model yields higher F1 scores than the previous state-of-the-art systems trained on in-domain data.
arXiv Detail & Related papers (2022-10-22T00:22:55Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - Towards Fast Adaptation of Pretrained Contrastive Models for
Multi-channel Video-Language Retrieval [70.30052749168013]
Multi-channel video-language retrieval require models to understand information from different channels.
contrastive multimodal models are shown to be highly effective at aligning entities in images/videos and text.
There is not a clear way to quickly adapt these two lines to multi-channel video-language retrieval with limited data and resources.
arXiv Detail & Related papers (2022-06-05T01:43:52Z) - Align and Prompt: Video-and-Language Pre-training with Entity Prompts [111.23364631136339]
Video-and-language pre-training has shown promising improvements on various downstream tasks.
We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment.
Our code and pre-trained models will be released.
arXiv Detail & Related papers (2021-12-17T15:55:53Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.