Boosting Breast Ultrasound Video Classification by the Guidance of
Keyframe Feature Centers
- URL: http://arxiv.org/abs/2306.06877v1
- Date: Mon, 12 Jun 2023 05:30:09 GMT
- Title: Boosting Breast Ultrasound Video Classification by the Guidance of
Keyframe Feature Centers
- Authors: AnLan Sun, Zhao Zhang, Meng Lei, Yuting Dai, Dong Wang, Liwei Wang
- Abstract summary: We propose KGA-Net and coherence loss to enhance the performance of ultrasound video classification.
Our method boosts the performance on the public BUSV dataset by a large margin.
- Score: 16.527815681294534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Breast ultrasound videos contain richer information than ultrasound images,
therefore it is more meaningful to develop video models for this diagnosis
task. However, the collection of ultrasound video datasets is much harder. In
this paper, we explore the feasibility of enhancing the performance of
ultrasound video classification using the static image dataset. To this end, we
propose KGA-Net and coherence loss. The KGA-Net adopts both video clips and
static images to train the network. The coherence loss uses the feature centers
generated by the static images to guide the frame attention in the video model.
Our KGA-Net boosts the performance on the public BUSV dataset by a large
margin. The visualization results of frame attention prove the explainability
of our method. The codes and model weights of our method will be made publicly
available.
Related papers
- DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - A Complementary Global and Local Knowledge Network for Ultrasound
denoising with Fine-grained Refinement [0.7424725048947504]
Ultrasound imaging serves as an effective and non-invasive diagnostic tool commonly employed in clinical examinations.
Existing methods for speckle noise reduction induce excessive image smoothing or fail to preserve detailed information adequately.
We propose a complementary global and local knowledge network for ultrasound denoising with fine-grained refinement.
arXiv Detail & Related papers (2023-10-05T09:12:34Z) - A New Dataset and A Baseline Model for Breast Lesion Detection in
Ultrasound Videos [43.42513012531214]
We first collect and annotate an ultrasound video dataset (188 videos) for breast lesion detection.
We propose a clip-level and video-level feature aggregated network (CVA-Net) for addressing breast lesion detection in ultrasound videos.
arXiv Detail & Related papers (2022-07-01T01:37:50Z) - VRAG: Region Attention Graphs for Content-Based Video Retrieval [85.54923500208041]
Region Attention Graph Networks (VRAG) improves the state-of-the-art video-level methods.
VRAG represents videos at a finer granularity via region-level features and encodes video-temporal dynamics through region-level relations.
We show that the performance gap between video-level and frame-level methods can be reduced by segmenting videos into shots and using shot embeddings for video retrieval.
arXiv Detail & Related papers (2022-05-18T16:50:45Z) - Weakly-Supervised Action Detection Guided by Audio Narration [50.4318060593995]
We propose a model to learn from the narration supervision and utilize multimodal features, including RGB, motion flow, and ambient sound.
Our experiments show that noisy audio narration suffices to learn a good action detection model, thus reducing annotation expenses.
arXiv Detail & Related papers (2022-05-12T06:33:24Z) - Weakly Supervised Contrastive Learning for Better Severity Scoring of
Lung Ultrasound [0.044364554283083675]
Several AI-based patient severity scoring models have been proposed that rely on scoring the appearance of the ultrasound scans.
We address the challenge of labeling every ultrasound frame in the video clips.
Our contrastive learning method treats the video clip severity labels as noisy weak severity labels for individual frames.
We show that it performs better than the conventional cross-entropy loss based training.
arXiv Detail & Related papers (2022-01-18T23:45:18Z) - Localizing Visual Sounds the Hard Way [149.84890978170174]
We train the network to explicitly discriminate challenging image fragments, even for images that do contain the object emitting the sound.
We show that our algorithm achieves state-of-the-art performance on the popular Flickr SoundNet dataset.
We introduce the VGG-Sound Source (VGG-SS) benchmark, a new set of annotations for the recently-introduced VGG-Sound dataset.
arXiv Detail & Related papers (2021-04-06T17:38:18Z) - Video Captioning in Compressed Video [1.953018353016675]
We propose a video captioning method which operates directly on the stored compressed videos.
To learn a discriminative visual representation for video captioning, we design a residuals-assisted encoder (RAE), which spots regions of interest in I-frames.
We evaluate our method on two benchmark datasets and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-01-02T03:06:03Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.