PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer
- URL: http://arxiv.org/abs/2407.16829v1
- Date: Tue, 23 Jul 2024 20:40:17 GMT
- Title: PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer
- Authors: Samhita Marri, Arun N. Sivakumar, Naveen K. Uppalapati, Girish Chowdhary,
- Abstract summary: Tracking plant features is crucial for various agricultural tasks like phenotyping, pruning, or harvesting.
We propose PlantTrack where we utilize DINOv2 which provides high-dimensional features, and train a keypoint heatmap predictor network.
We show that with as few as 20 synthetic images for training the keypoint predictor, we achieve zero-shot Sim2Real transfer, enabling effective tracking of plant features in real environments.
- Score: 4.923031976899536
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Tracking plant features is crucial for various agricultural tasks like phenotyping, pruning, or harvesting, but the unstructured, cluttered, and deformable nature of plant environments makes it a challenging task. In this context, the recent advancements in foundational models show promise in addressing this challenge. In our work, we propose PlantTrack where we utilize DINOv2 which provides high-dimensional features, and train a keypoint heatmap predictor network to identify the locations of semantic features such as fruits and leaves which are then used as prompts for point tracking across video frames using TAPIR. We show that with as few as 20 synthetic images for training the keypoint predictor, we achieve zero-shot Sim2Real transfer, enabling effective tracking of plant features in real environments.
Related papers
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
Clutter [14.489086924126253]
This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes.
Existing approaches often employ multi-stage pipelines that first segment the referred object and then propose a suitable grasp, and are evaluated in private datasets or simulators that do not capture the complexity of natural indoor scenes.
We propose a novel end-to-end model (CROG) that leverages the visual grounding capabilities of CLIP to learn synthesis grasp directly from image-text pairs.
arXiv Detail & Related papers (2023-11-09T22:55:10Z) - Push Past Green: Learning to Look Behind Plant Foliage by Moving It [19.36396157137122]
Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging.
We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant.
As SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage.
arXiv Detail & Related papers (2023-07-06T17:55:28Z) - Semantics-Aware Next-best-view Planning for Efficient Search and Detection of Task-relevant Plant Parts [3.9074818653555554]
To automate harvesting and de-leafing of tomato plants, it is important to search and detect the task-relevant plant parts.
Current active-vision algorithms cannot differentiate between relevant and irrelevant plant parts.
We propose a semantics-aware active-vision strategy that uses semantic information to identify the relevant plant parts.
arXiv Detail & Related papers (2023-06-16T12:22:19Z) - Tracking through Containers and Occluders in the Wild [32.86030395660071]
We introduce $textbfTCOW$, a new benchmark and model for visual tracking through heavy occlusion and containment.
We create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance.
We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.
arXiv Detail & Related papers (2023-05-04T17:59:58Z) - Inside Out: Transforming Images of Lab-Grown Plants for Machine Learning
Applications in Agriculture [0.0]
We employ a contrastive unpaired translation (CUT) generative adversarial network (GAN) to translate indoor plant images to appear as field images.
While we train our network to translate an image containing only a single plant, we show that our method is easily extendable to produce multiple-plant field images.
We also use our synthetic multi-plant images to train several YoloV5 nano object detection models to perform the task of plant detection.
arXiv Detail & Related papers (2022-11-05T20:51:45Z) - Semantic Image Segmentation with Deep Learning for Vine Leaf Phenotyping [59.0626764544669]
In this study, we use Deep Learning methods to semantically segment grapevine leaves images in order to develop an automated object detection system for leaf phenotyping.
Our work contributes to plant lifecycle monitoring through which dynamic traits such as growth and development can be captured and quantified.
arXiv Detail & Related papers (2022-10-24T14:37:09Z) - Optical flow-based branch segmentation for complex orchard environments [73.11023209243326]
We train a neural network system in simulation only using simulated RGB data and optical flow.
This resulting neural network is able to perform foreground segmentation of branches in a busy orchard environment without additional real-world training or using any special setup or equipment beyond a standard camera.
Our results show that our system is highly accurate and, when compared to a network using manually labeled RGBD data, achieves significantly more consistent and robust performance across environments that differ from the training set.
arXiv Detail & Related papers (2022-02-26T03:38:20Z) - Stochastic Layers in Vision Transformers [85.38733795180497]
We introduce fully layers in vision transformers, without causing any severe drop in performance.
The additionality boosts the robustness of visual features and strengthens privacy.
We use our features for three different applications, namely, adversarial robustness, network calibration, and feature privacy.
arXiv Detail & Related papers (2021-12-30T16:07:59Z) - Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks.
We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters.
It learns this mapping by training to find the set of best parameters on a set of "seen" tasks.
Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.