Incorporating Domain Knowledge To Improve Topic Segmentation Of Long
MOOC Lecture Videos
- URL: http://arxiv.org/abs/2012.07589v1
- Date: Tue, 8 Dec 2020 13:37:40 GMT
- Title: Incorporating Domain Knowledge To Improve Topic Segmentation Of Long
MOOC Lecture Videos
- Authors: Ananda Das, Partha Pratim Das
- Abstract summary: We propose an algorithm for automatically detecting different coherent topics present inside a long lecture video.
We use the language model on speech-to-text transcription to capture the implicit meaning of the whole video.
We also leverage the domain knowledge we can capture the way instructor binds and connects different concepts while teaching.
- Score: 4.189643331553923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topical Segmentation poses a great role in reducing search space of the
topics taught in a lecture video specially when the video metadata lacks topic
wise segmentation information. This segmentation information eases user efforts
of searching, locating and browsing a topic inside a lecture video. In this
work we propose an algorithm, that combines state-of-the art language model and
domain knowledge graph for automatically detecting different coherent topics
present inside a long lecture video. We use the language model on
speech-to-text transcription to capture the implicit meaning of the whole video
while the knowledge graph provides us the domain specific dependencies between
different concepts of that subjects. Also leveraging the domain knowledge we
can capture the way instructor binds and connects different concepts while
teaching, which helps us in achieving better segmentation accuracy. We tested
our approach on NPTEL lecture videos and holistic evaluation shows that it out
performs the other methods described in the literature.
Related papers
- ViLLa: Video Reasoning Segmentation with Large Language Model [48.75470418596875]
We propose a new video segmentation task - video reasoning segmentation.
The task is designed to output tracklets of segmentation masks given a complex input text query.
We present ViLLa: Video reasoning segmentation with a Large Language Model.
arXiv Detail & Related papers (2024-07-18T17:59:17Z) - Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels [34.88705952395676]
Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence)
We introduce a new approach called hybrid-learning video moment retrieval to solve the problem by knowledge transfer.
Our aim is to explore shared universal knowledge between the two domains in order to improve model learning in the weakly-labelled target domain.
arXiv Detail & Related papers (2024-06-03T21:14:53Z) - Unsupervised Audio-Visual Lecture Segmentation [31.29084124332193]
We introduce AVLectures, a dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects.
Our second contribution is introducing video lecture segmentation that splits lectures into bite-sized topics that show promise in improving learner engagement.
We use these representations to generate segments using a temporally consistent 1-nearest neighbor algorithm, TW-FINCH.
arXiv Detail & Related papers (2022-10-29T16:26:34Z) - Self-Supervised Learning for Videos: A Survey [70.37277191524755]
Self-supervised learning has shown promise in both image and video domains.
In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.
arXiv Detail & Related papers (2022-06-18T00:26:52Z) - Contrastive Graph Multimodal Model for Text Classification in Videos [9.218562155255233]
We are the first to address this new task of video text classification by fusing multimodal information.
We tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting layout information.
We construct a new well-defined industrial dataset from the news domain, called TI-News, which is dedicated to building and evaluating video text recognition and classification applications.
arXiv Detail & Related papers (2022-06-06T04:06:21Z) - Video-Text Pre-training with Learned Regions [59.30893505895156]
Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs.
We propose a module for videotext-learning, RegionLearner, which can take into account the structure of objects during pre-training on large-scale video-text pairs.
arXiv Detail & Related papers (2021-12-02T13:06:53Z) - Multi-Modal Interaction Graph Convolutional Network for Temporal
Language Localization in Videos [55.52369116870822]
This paper focuses on tackling the problem of temporal language localization in videos.
It aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video.
arXiv Detail & Related papers (2021-10-12T14:59:25Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.