Leveraging Foundation Models To learn the shape of semi-fluid deformable objects
- URL: http://arxiv.org/abs/2411.16802v1
- Date: Mon, 25 Nov 2024 13:41:35 GMT
- Title: Leveraging Foundation Models To learn the shape of semi-fluid deformable objects
- Authors: Omar El Assal, Carlos M. Mateo, Sebastien Ciron, David Fofi,
- Abstract summary: A keen interest was manifested by researchers in the last decade to characterize and manipulate deformable objects of non-fluid nature.
In this paper, we address the subject of characterizing weld pool to define stable features that serve as information for motion control objectives.
The performance of knowledge distillation from foundation models into a smaller generative model shows prominent results in the characterization of deformable objects.
- Score: 0.7895162173260983
- License:
- Abstract: One of the difficulties imposed on the manipulation of deformable objects is their characterization and the detection of representative keypoints for the purpose of manipulation. A keen interest was manifested by researchers in the last decade to characterize and manipulate deformable objects of non-fluid nature, such as clothes and ropes. Even though several propositions were made in the regard of object characterization, however researchers were always confronted with the need of pixel-level information of the object through images to extract relevant information. This usually is accomplished by means of segmentation networks trained on manually labeled data for this purpose. In this paper, we address the subject of characterizing weld pool to define stable features that serve as information for further motion control objectives. We achieve this by employing different pipelines. The first one consists of characterizing fluid deformable objects through the use of a generative model that is trained using a teacher-student framework. And in the second one we leverage foundation models by using them as teachers to characterize the object in the image, without the need of any pre-training and any dataset. The performance of knowledge distillation from foundation models into a smaller generative model shows prominent results in the characterization of deformable objects. The student network was capable of learning to retrieve the keypoitns of the object with an error of 13.4 pixels. And the teacher was evaluated based on its capacities to retrieve pixel level information represented by the object mask, with a mean Intersection Over Union (mIoU) of 75.26%.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images.
Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms.
We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z) - Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping [14.958823096408175]
Foundation models are a strong trend in deep learning and computer vision.
Here, we focus on training such an object identification model.
Key solution to train such a model is the centroid triplet loss (CTL), which aggregates image features to their centroids.
arXiv Detail & Related papers (2024-04-09T13:01:26Z) - Explicitly Disentangled Representations in Object-Centric Learning [0.0]
We propose a novel architecture that biases object-centric models toward disentangling shape and texture components.
In particular, we propose a novel architecture that biases object-centric models toward disentangling shape and texture components.
arXiv Detail & Related papers (2024-01-18T17:22:11Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Text-driven object affordance for guiding grasp-type recognition in
multimodal robot teaching [18.529563816600607]
This study investigates how text-driven object affordance affects image-based grasp-type recognition in robot teaching.
They created labeled datasets of first-person hand images to examine the impact of object affordance on recognition performance.
arXiv Detail & Related papers (2021-02-27T17:03:32Z) - Object-Centric Image Generation with Factored Depths, Locations, and
Appearances [30.541425619507184]
We present a generative model of images that explicitly reasons over the set of objects they show.
Our model learns a structured latent representation that separates objects from each other and from the background.
It can be trained from images alone in a purely unsupervised fashion without the need for object masks or depth information.
arXiv Detail & Related papers (2020-04-01T18:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.