Team I2R-VI-FF Technical Report on EPIC-KITCHENS VISOR Hand Object
Segmentation Challenge 2023
- URL: http://arxiv.org/abs/2310.20120v1
- Date: Tue, 31 Oct 2023 01:43:14 GMT
- Title: Team I2R-VI-FF Technical Report on EPIC-KITCHENS VISOR Hand Object
Segmentation Challenge 2023
- Authors: Fen Fang, Yi Cheng, Ying Sun and Qianli Xu
- Abstract summary: We present our approach to the EPIC-KITCHENS VISOR Hand Object Challenge.
Our approach combines the baseline method, Point-based Rendering (PointRend) and the Segment Anything Model (SAM)
By effectively combining the strengths of existing methods and applying our refinements, our submission achieved the 1st place in the VISOR HOS Challenge.
- Score: 12.266684016563733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present our approach to the EPIC-KITCHENS VISOR Hand
Object Segmentation Challenge, which focuses on the estimation of the relation
between the hands and the objects given a single frame as input. The
EPIC-KITCHENS VISOR dataset provides pixel-wise annotations and serves as a
benchmark for hand and active object segmentation in egocentric video. Our
approach combines the baseline method, i.e., Point-based Rendering (PointRend)
and the Segment Anything Model (SAM), aiming to enhance the accuracy of hand
and object segmentation outcomes, while also minimizing instances of missed
detection. We leverage accurate hand segmentation maps obtained from the
baseline method to extract more precise hand and in-contact object segments. We
utilize the class-agnostic segmentation provided by SAM and apply specific
hand-crafted constraints to enhance the results. In cases where the baseline
model misses the detection of hands or objects, we re-train an object detector
on the training set to enhance the detection accuracy. The detected hand and
in-contact object bounding boxes are then used as prompts to extract their
respective segments from the output of SAM. By effectively combining the
strengths of existing methods and applying our refinements, our submission
achieved the 1st place in terms of evaluation criteria in the VISOR HOS
Challenge.
Related papers
- HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild [21.54235700930195]
HOIST-Former is adept at spatially and temporally segmenting hands and objects by iteratively pooling features from each other.
We contribute an in-the-wild video dataset called HOIST, which comprises 4,125 videos complete with bounding boxes, segmentation masks, and tracking IDs for hand-held objects.
arXiv Detail & Related papers (2024-04-22T01:42:45Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object
and Boundary Constraints [9.238103649037951]
We present a framework aimed at leveraging the raw output of SAM by exploiting two novel concepts called SAM-Generated Object (SGO) and SAM-Generated Boundary (SGB)
Taking into account the content characteristics of SGO, we introduce the concept of object consistency to leverage segmented regions lacking semantic information.
The boundary loss capitalizes on the distinctive features of SGB by directing the model's attention to the boundary information of the object.
arXiv Detail & Related papers (2023-12-05T03:33:47Z) - Lidar Panoptic Segmentation and Tracking without Bells and Whistles [48.078270195629415]
We propose a detection-centric network for lidar segmentation and tracking.
One of the core components of our network is the object instance detection branch.
We evaluate our method on several 3D/4D LPS benchmarks and observe that our model establishes a new state-of-the-art among open-sourced models.
arXiv Detail & Related papers (2023-10-19T04:44:43Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - SegmentMeIfYouCan: A Benchmark for Anomaly Segmentation [111.61261419566908]
Deep neural networks (DNNs) are usually trained on a closed set of semantic classes.
They are ill-equipped to handle previously-unseen objects.
detecting and localizing such objects is crucial for safety-critical applications such as perception for automated driving.
arXiv Detail & Related papers (2021-04-30T07:58:19Z) - Interpretable and Accurate Fine-grained Recognition via Region Grouping [14.28113520947247]
We present an interpretable deep model for fine-grained visual recognition.
At the core of our method lies the integration of region-based part discovery and attribution within a deep neural network.
Our results compare favorably to state-of-the-art methods on classification tasks.
arXiv Detail & Related papers (2020-05-21T01:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.