Knowledge Graph Extraction from Videos
- URL: http://arxiv.org/abs/2007.10040v1
- Date: Mon, 20 Jul 2020 12:23:39 GMT
- Title: Knowledge Graph Extraction from Videos
- Authors: Louis Mahon, Eleonora Giunchiglia, Bowen Li, Thomas Lukasiewicz
- Abstract summary: We propose a new task of knowledge graph extraction from videos, producing a description in the form of a knowledge graph of the contents of a given video.
Since no datasets exist for this task, we also include a method to automatically generate them, starting from datasets where videos are annotated with natural language.
We report results on MSVD* and MSR-VTT*, two datasets obtained from MSVD and MSR-VTT using our method.
- Score: 46.31652453979874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nearly all existing techniques for automated video annotation (or captioning)
describe videos using natural language sentences. However, this has several
shortcomings: (i) it is very hard to then further use the generated natural
language annotations in automated data processing, (ii) generating natural
language annotations requires to solve the hard subtask of generating
semantically precise and syntactically correct natural language sentences,
which is actually unrelated to the task of video annotation, (iii) it is
difficult to quantitatively measure performance, as standard metrics (e.g.,
accuracy and F1-score) are inapplicable, and (iv) annotations are
language-specific. In this paper, we propose the new task of knowledge graph
extraction from videos, i.e., producing a description in the form of a
knowledge graph of the contents of a given video. Since no datasets exist for
this task, we also include a method to automatically generate them, starting
from datasets where videos are annotated with natural language. We then
describe an initial deep-learning model for knowledge graph extraction from
videos, and report results on MSVD* and MSR-VTT*, two datasets obtained from
MSVD and MSR-VTT using our method.
Related papers
- Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension [4.164728134421114]
Referring Expression (REC) aims to identify a particular object in a scene by a natural language expression, and is an important topic in visual language understanding.
State-of-the-art methods for this task are based on deep learning, which generally requires expensive and manually labeled annotations.
We propose a novel framework that generates artificial data for the REC task, taking into account both textual and visual modalities.
arXiv Detail & Related papers (2024-11-22T09:08:36Z) - Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset [4.452729255042396]
A more robust and holistic language-video representation is the key to pushing video understanding forward.
The current plain and simple text descriptions and the visual-only focus for the language-video tasks result in a limited capacity in real-world natural language video retrieval tasks.
This paper introduces a method to automatically enhance video-language datasets, making them more modality and context-aware.
arXiv Detail & Related papers (2024-06-19T20:16:17Z) - Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model [10.742625681420279]
Based on a large-scale pre-trained Vision Language Model (VLM), our model learns and improves visual, textual, and numerical representations of patient gait videos.
Results demonstrate that our model not only significantly outperforms state-of-the-art methods in video-based classification tasks but also adeptly decodes the learned class-specific text features into natural language descriptions.
arXiv Detail & Related papers (2024-03-20T17:03:38Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection
to Image-Text Pre-Training [70.83385449872495]
The correlation between the vision and text is essential for video moment retrieval (VMR)
Existing methods rely on separate pre-training feature extractors for visual and textual understanding.
We propose a generic method, referred to as Visual-Dynamic Injection (VDI), to empower the model's understanding of video moments.
arXiv Detail & Related papers (2023-02-28T19:29:05Z) - Language-free Training for Zero-shot Video Grounding [50.701372436100684]
Video grounding aims to localize the time interval by understanding the text and video simultaneously.
One of the most challenging issues is an extremely time- and cost-consuming annotation collection.
We present a simple yet novel training framework for video grounding in the zero-shot setting.
arXiv Detail & Related papers (2022-10-24T06:55:29Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.