A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
- URL: http://arxiv.org/abs/2005.13362v2
- Date: Thu, 28 May 2020 03:13:49 GMT
- Title: A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
- Authors: Edison Marrese-Taylor, Cristian Rodriguez-Opazo, Jorge A. Balazs,
Stephen Gould, Yutaka Matsuo
- Abstract summary: We propose a multi-modal approach for mining fine-grained opinions from video reviews.
Our approach works at the sentence level without the need for time annotations.
- Score: 47.726065950436585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent advances in opinion mining for written reviews, few works
have tackled the problem on other sources of reviews. In light of this issue,
we propose a multi-modal approach for mining fine-grained opinions from video
reviews that is able to determine the aspects of the item under review that are
being discussed and the sentiment orientation towards them. Our approach works
at the sentence level without the need for time annotations and uses features
derived from the audio, video and language transcriptions of its contents. We
evaluate our approach on two datasets and show that leveraging the video and
audio modalities consistently provides increased performance over text-only
baselines, providing evidence these extra modalities are key in better
understanding video reviews.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - HOTVCOM: Generating Buzzworthy Comments for Videos [49.39846630199698]
This study introduces textscHotVCom, the largest Chinese video hot-comment dataset, comprising 94k diverse videos and 137 million comments.
We also present the textttComHeat framework, which synergistically integrates visual, auditory, and textual data to generate influential hot-comments on the Chinese video dataset.
arXiv Detail & Related papers (2024-09-23T16:45:13Z) - Reviewer2: Optimizing Review Generation Through Prompt Generation [27.379753994272875]
We propose an efficient two-stage review generation framework called Reviewer2.
Unlike prior work, this approach explicitly models the distribution of possible aspects that the review may address.
We generate a large-scale review dataset of 27k papers and 99k reviews that we annotate with aspect prompts.
arXiv Detail & Related papers (2024-02-16T18:43:10Z) - End-to-End Evaluation for Low-Latency Simultaneous Speech Translation [55.525125193856084]
We propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions.
This includes the segmentation of the audio as well as the run-time of the different components.
We also compare different approaches to low-latency speech translation using this framework.
arXiv Detail & Related papers (2023-08-07T09:06:20Z) - Video Summarization Overview [25.465707307283434]
Video summarization facilitates quickly grasping video content by creating a compact summary of videos.
This survey covers early studies as well as recent approaches which take advantage of deep learning techniques.
arXiv Detail & Related papers (2022-10-21T03:29:31Z) - Video Moment Retrieval from Text Queries via Single Frame Annotation [65.92224946075693]
Video moment retrieval aims at finding the start and end timestamps of a moment described by a given natural language query.
Fully supervised methods need complete temporal boundary annotations to achieve promising results.
We propose a new paradigm called "glance annotation"
arXiv Detail & Related papers (2022-04-20T11:59:17Z) - Fill-in-the-blank as a Challenging Video Understanding Evaluation
Framework [19.031957183047048]
We introduce a novel dataset consisting of 28,000 videos and fill-in-the-blank tests.
We show that both a multimodal model and a strong language model have a large gap with human performance.
arXiv Detail & Related papers (2021-04-09T04:00:10Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - How Useful are Reviews for Recommendation? A Critical Review and
Potential Improvements [8.471274313213092]
We investigate a growing body of work that seeks to improve recommender systems through the use of review text.
Our initial findings reveal several discrepancies in reported results, partly due to copying results across papers despite changes in experimental settings or data pre-processing.
Further investigation calls for discussion on a much larger problem about the "importance" of user reviews for recommendation.
arXiv Detail & Related papers (2020-05-25T16:30:05Z) - What comprises a good talking-head video generation?: A Survey and
Benchmark [40.26689818789428]
We present a benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
We propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video.
arXiv Detail & Related papers (2020-05-07T01:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.