SynthRef: Generation of Synthetic Referring Expressions for Object
Segmentation
- URL: http://arxiv.org/abs/2106.04403v2
- Date: Wed, 9 Jun 2021 05:39:51 GMT
- Title: SynthRef: Generation of Synthetic Referring Expressions for Object
Segmentation
- Authors: Ioannis Kazakos, Carles Ventura, Miriam Bellver, Carina Silberer and
Xavier Giro-i-Nieto
- Abstract summary: We present and disseminate the first large-scale dataset with synthetic referring expressions for video object segmentation.
Our experiments demonstrate that by training with our synthetic referring expressions one can improve the ability of a model to generalize across different datasets.
- Score: 7.690965189591581
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in deep learning have brought significant progress in visual
grounding tasks such as language-guided video object segmentation. However,
collecting large datasets for these tasks is expensive in terms of annotation
time, which represents a bottleneck. To this end, we propose a novel method,
namely SynthRef, for generating synthetic referring expressions for target
objects in an image (or video frame), and we also present and disseminate the
first large-scale dataset with synthetic referring expressions for video object
segmentation. Our experiments demonstrate that by training with our synthetic
referring expressions one can improve the ability of a model to generalize
across different datasets, without any additional annotation cost. Moreover,
our formulation allows its application to any object detection or segmentation
dataset.
Related papers
- VideoOrion: Tokenizing Object Dynamics in Videos [33.26917406964148]
We present VideoOrion, a Video Large Language Model (Video-LLM) that explicitly captures the key semantic information in videos.
VideoOrion employs expert vision models to extract object dynamics through a detect-segment-track pipeline.
Our method addresses the persistent challenge in Video-LLMs of efficiently compressing high-dimensional video data into semantic tokens.
arXiv Detail & Related papers (2024-11-25T07:32:02Z) - Submodular video object proposal selection for semantic object segmentation [1.223779595809275]
We learn a data-driven representation which captures the subset of multiple instances from continuous frames.
This selection process is formulated as a facility location problem solved by maximising a submodular function.
Our method retrieves the longer term contextual dependencies which underpins a robust semantic video object segmentation algorithm.
arXiv Detail & Related papers (2024-07-08T13:18:49Z) - 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - MeViS: A Large-scale Benchmark for Video Segmentation with Motion
Expressions [93.35942025232943]
We propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments.
The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms.
arXiv Detail & Related papers (2023-08-16T17:58:34Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Synthetic Convolutional Features for Improved Semantic Segmentation [139.5772851285601]
We suggest to generate intermediate convolutional features and propose the first synthesis approach that is catered to such intermediate convolutional features.
This allows us to generate new features from label masks and include them successfully into the training procedure.
Experimental results and analysis on two challenging datasets Cityscapes and ADE20K show that our generated feature improves performance on segmentation tasks.
arXiv Detail & Related papers (2020-09-18T14:12:50Z) - Instance Segmentation of Visible and Occluded Regions for Finding and
Picking Target from a Pile of Objects [25.836334764387498]
We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object.
We extend an existing instance segmentation model with a novel relook' architecture, in which the model explicitly learns the inter-instance relationship.
Also, by using image synthesis, we make the system capable of handling new objects without human annotations.
arXiv Detail & Related papers (2020-01-21T12:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.