SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment
Anything Model
- URL: http://arxiv.org/abs/2307.16586v4
- Date: Thu, 21 Dec 2023 07:03:08 GMT
- Title: SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment
Anything Model
- Authors: Shili Zhou, Ruian He, Weimin Tan and Bo Yan
- Abstract summary: We propose a solution to embed the frozen SAM image encoder into FlowFormer to enhance object perception.
Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1-all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%.
- Score: 17.88914104216893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Flow Estimation aims to find the 2D dense motion field between two
frames. Due to the limitation of model structures and training datasets,
existing methods often rely too much on local clues and ignore the integrity of
objects, resulting in fragmented motion estimation. Through theoretical
analysis, we find the pre-trained large vision models are helpful in optical
flow estimation, and we notice that the recently famous Segment Anything Model
(SAM) demonstrates a strong ability to segment complete objects, which is
suitable for solving the fragmentation problem. We thus propose a solution to
embed the frozen SAM image encoder into FlowFormer to enhance object
perception. To address the challenge of in-depth utilizing SAM in
non-segmentation tasks like optical flow estimation, we propose an Optical Flow
Task-Specific Adaption scheme, including a Context Fusion Module to fuse the
SAM encoder with the optical flow context encoder, and a Context Adaption
Module to adapt the SAM features for optical flow task with Learned
Task-Specific Embedding. Our proposed SAMFlow model reaches 0.86/2.10
clean/final EPE and 3.55/12.32 EPE/F1-all on Sintel and KITTI-15 training set,
surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model
achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks,
ranking #1 among all two-frame methods on Sintel clean pass.
Related papers
- UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model [12.706915226843401]
UnSAMFlow is an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM)
We analyze the poor gradient landscapes of traditional smoothness losses and propose a new smoothness definition based on homography instead.
Our method produces clear optical flow estimation with sharp boundaries around objects, which outperforms state-of-the-art methods on KITTI and Sintel datasets.
arXiv Detail & Related papers (2024-05-04T08:27:12Z) - Moving Object Segmentation: All You Need Is SAM (and Flow) [82.78026782967959]
We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects.
In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt.
These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks.
arXiv Detail & Related papers (2024-04-18T17:59:53Z) - SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations [44.92134227376008]
This paper introduces two synergistic techniques, Self-Cleaning Iteration (SCI) and Regression Focal Loss (RFL)
SCI and RFL prove particularly effective in mitigating error propagation, a prevalent issue in optical flow models that employ iterative refinement.
The effectiveness of our proposed SCI and RFL techniques, collectively referred to as SciFlow for brevity, is demonstrated across two distinct lightweight optical flow model architectures in our experiments.
arXiv Detail & Related papers (2024-04-11T21:41:55Z) - A SAM-guided Two-stream Lightweight Model for Anomaly Detection [50.28310943263051]
We propose a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM)
Our experiments conducted on MVTec AD benchmark show that STLM, with about 16M parameters and achieving an inference time in 20ms, competes effectively with state-of-the-art methods.
arXiv Detail & Related papers (2024-02-29T13:29:10Z) - A Spatial-Temporal Dual-Mode Mixed Flow Network for Panoramic Video
Salient Object Detection [5.207048071888257]
We propose a Spatial-Temporal Dual-Mode Mixed Flow Network (STDMMF-Net) that exploits the spatial flow of panoramic video and the corresponding optical flow for SOD.
A large number of subjective and objective experimental results testify that the proposed method demonstrates better detection accuracy than the state-of-the-art (SOTA) methods.
The comprehensive performance of the proposed method is better in terms of memory required for model inference, testing time, complexity, and generalization performance.
arXiv Detail & Related papers (2023-10-13T11:25:41Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - FAMINet: Learning Real-time Semi-supervised Video Object Segmentation
with Steepest Optimized Optical Flow [21.45623125216448]
Semi-supervised video object segmentation (VOS) aims to segment a few moving objects in a video sequence, where these objects are specified by annotation of first frame.
The optical flow has been considered in many existing semi-supervised VOS methods to improve the segmentation accuracy.
A FAMINet, which consists of a feature extraction network (F), an appearance network (A), a motion network (M), and an integration network (I), is proposed in this study to address the abovementioned problem.
arXiv Detail & Related papers (2021-11-20T07:24:33Z) - ASFlow: Unsupervised Optical Flow Learning with Adaptive Pyramid
Sampling [26.868635622137106]
We present an unsupervised optical flow estimation method by proposing an adaptive pyramid sampling in the deep pyramid network.
Our method achieves the best performance for unsupervised optical flow estimation on multiple leading benchmarks, including MPI-SIntel, KITTI 2012 and KITTI 2015.
arXiv Detail & Related papers (2021-04-08T07:22:35Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - FPCR-Net: Feature Pyramidal Correlation and Residual Reconstruction for
Optical Flow Estimation [72.41370576242116]
We propose a semi-supervised Feature Pyramidal Correlation and Residual Reconstruction Network (FPCR-Net) for optical flow estimation from frame pairs.
It consists of two main modules: pyramid correlation mapping and residual reconstruction.
Experiment results show that the proposed scheme achieves the state-of-the-art performance, with improvement by 0.80, 1.15 and 0.10 in terms of average end-point error (AEE) against competing baseline methods.
arXiv Detail & Related papers (2020-01-17T07:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.