Related papers: Comparing a composite model versus chained models to locate a nearest visual object

Comparing a composite model versus chained models to locate a nearest visual object

URL: http://arxiv.org/abs/2306.01551v1
Date: Fri, 2 Jun 2023 13:58:59 GMT
Title: Comparing a composite model versus chained models to locate a nearest visual object
Authors: Antoine Le Borgne, Xavier Marjou, Fanny Parzysz, Tayeb Lemlouma
Abstract summary: We investigate the selection of an appropriate artificial neural network model for extracting information from geographic images and text. Our results showed that these two architectures achieved the same level performance with a root mean square error (RMSE) of 0.055 and 0.056. When the task can be decomposed into sub-tasks, the chain architecture exhibits a twelve-fold increase in training speed compared to the composite model.
Score: 0.6882042556551609
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Extracting information from geographic images and text is crucial for autonomous vehicles to determine in advance the best cell stations to connect to along their future path. Multiple artificial neural network models can address this challenge; however, there is no definitive guidance on the selection of an appropriate model for such use cases. Therefore, we experimented two architectures to solve such a task: a first architecture with chained models where each model in the chain addresses a sub-task of the task; and a second architecture with a single model that addresses the whole task. Our results showed that these two architectures achieved the same level performance with a root mean square error (RMSE) of 0.055 and 0.056; The findings further revealed that when the task can be decomposed into sub-tasks, the chain architecture exhibits a twelve-fold increase in training speed compared to the composite model. Nevertheless, the composite model significantly alleviates the burden of data labeling.

Related papers

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models -- termed PLeaS -- which relaxes constraints. PLeaS partially matches nodes in each layer by maximizing alignment. We also demonstrate how our method can be extended to address a challenging scenario where no data is available from the fine-tuning domains.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
Pre-Trained Model Recommendation for Downstream Fine-tuning [22.343011779348682]
Model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task. Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks. We present a pragmatic framework textbfFennec, delving into a diverse, large-scale model repository.
arXiv Detail & Related papers (2024-03-11T02:24:32Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
Knowledge is a Region in Weight Space for Fine-tuned Language Models [48.589822853418404]
We study how the weight space and the underlying loss landscape of different models are interconnected. We show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster.
arXiv Detail & Related papers (2023-02-09T18:59:18Z)
Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks. We formulate all three tasks as a unified dense correspondence matching problem. Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z)
Triple-level Model Inferred Collaborative Network Architecture for Video Deraining [43.06607185181434]
We develop a model-guided triple-level optimization framework to deduce network architecture with cooperating optimization and auto-searching mechanism. Our model shows significant improvements in fidelity and temporal consistency over the state-of-the-art works.
arXiv Detail & Related papers (2021-11-08T13:09:00Z)
Dynamic Spatiotemporal Graph Convolutional Neural Networks for Traffic Data Imputation with Complex Missing Patterns [3.9318191265352196]
We propose a novel deep learning framework called Dynamic Spatio Graph Contemporal Networks (DSTG) to impute missing traffic data. We introduce a graph structure estimation technique to model the dynamic spatial dependencies real-time traffic information and road network structure. Our proposed model outperforms existing deep learning models in all kinds of missing scenarios and the graph structure estimation technique contributes to the model performance.
arXiv Detail & Related papers (2021-09-17T05:47:17Z)
A Better Loss for Visual-Textual Grounding [74.81353762517979]
Given a textual phrase and an image, the visual grounding problem is defined as the task of locating the content of the image referenced by the sentence. It is a challenging task that has several real-world applications in human-computer interaction, image-text reference resolution, and video-text reference resolution. We propose a model that is able to achieve a higher accuracy than state-of-the-art models thanks to the adoption of a more effective loss function.
arXiv Detail & Related papers (2021-08-11T16:26:54Z)
GAN Cocktail: mixing GANs without dataset access [18.664733153082146]
We tackle the problem of model merging, given two constraints that often come up in the real world. In the first stage, we transform the weights of all the models to the same parameter space by a technique we term model rooting. In the second stage, we merge the rooted models by averaging their weights and fine-tuning them for each specific domain, using only data generated by the original trained models.
arXiv Detail & Related papers (2021-06-07T17:59:04Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
NASE: Learning Knowledge Graph Embedding for Link Prediction via Neural Architecture Search [9.634626241415916]
Link prediction is the task of predicting missing connections between entities in the knowledge graph (KG) Previous work has tried to use Automated Machine Learning (AutoML) to search for the best model for a given dataset. We propose a novel Neural Architecture Search (NAS) framework for the link prediction task.
arXiv Detail & Related papers (2020-08-18T03:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.