End-to-End Multi-Object Detection with a Regularized Mixture Model
- URL: http://arxiv.org/abs/2205.08714v3
- Date: Fri, 28 Apr 2023 06:08:12 GMT
- Title: End-to-End Multi-Object Detection with a Regularized Mixture Model
- Authors: Jaeyoung Yoo, Hojun Lee, Seunghyeon Seo, Inseop Chung, Nojun Kwak
- Abstract summary: Recent end-to-end multi-object detectors simplify the inference pipeline by removing hand-crafted processes.
We propose a novel framework to train an end-to-end multi-object detector consisting of only two terms: negative log-likelihood (NLL) and a regularization term.
- Score: 26.19278003378703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent end-to-end multi-object detectors simplify the inference pipeline by
removing hand-crafted processes such as non-maximum suppression (NMS). However,
during training, they still heavily rely on heuristics and hand-crafted
processes which deteriorate the reliability of the predicted confidence score.
In this paper, we propose a novel framework to train an end-to-end multi-object
detector consisting of only two terms: negative log-likelihood (NLL) and a
regularization term. In doing so, the multi-object detection problem is treated
as density estimation of the ground truth bounding boxes utilizing a
regularized mixture density model. The proposed \textit{end-to-end multi-object
Detection with a Regularized Mixture Model} (D-RMM) is trained by minimizing
the NLL with the proposed regularization term, maximum component maximization
(MCM) loss, preventing duplicate predictions. Our method reduces the heuristics
of the training process and improves the reliability of the predicted
confidence score. Moreover, our D-RMM outperforms the previous end-to-end
detectors on MS COCO dataset.
Related papers
- Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol [69.11739400975445]
We introduce the first theoretical framework for analyzing error accumulation in Model Context Protocol (MCP) agents.<n>We show that cumulative distortion exhibits linear growth and high-probability deviations bounded by $O(sqrtT)$.<n>Key findings include: semantic weighting reduces distortion by 80%, and periodic re-grounding approximately every 9 steps suffices for error control.
arXiv Detail & Related papers (2026-02-10T21:08:53Z) - Minimum Distance Summaries for Robust Neural Posterior Estimation [7.4716500353679685]
Simulation-based inference ( SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs.<n>We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE.
arXiv Detail & Related papers (2026-02-09T20:06:15Z) - Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models [40.216463162163976]
We develop a batch minorization-maximization algorithm for softmax-gated multinomial-logistic MoE.<n>We also prove finite-sample rates for conditional density estimation and parameter recovery.<n>Experiments on biological protein--protein interaction prediction validate the full pipeline.
arXiv Detail & Related papers (2026-02-08T14:45:41Z) - Forward Consistency Learning with Gated Context Aggregation for Video Anomaly Detection [17.79982215633934]
Video anomaly detection (VAD) aims to measure deviations from normal patterns for various events in real-time surveillance systems.<n>Most existing VAD methods rely on large-scale models to pursue extreme accuracy, limiting their feasibility on resource-limited edge devices.<n>We introduce FoGA, a lightweight VAD model that performs Forward consistency learning with Gated context aggregation.
arXiv Detail & Related papers (2026-01-26T04:35:31Z) - Contamination Detection for VLMs using Multi-Modal Semantic Perturbation [73.76465227729818]
Open-source Vision-Language Models (VLMs) have achieved state-of-the-art performance on benchmark tasks.<n>Pretraining corpora raise a critical concern for both practitioners and users: inflated performance due to test-set leakage.<n>We show that existing detection approaches either fail outright or exhibit inconsistent behavior.<n>We propose a novel simple yet effective detection method based on multi-modal semantic perturbation.
arXiv Detail & Related papers (2025-11-05T18:59:52Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Discretization-free Multicalibration through Loss Minimization over Tree Ensembles [22.276913140687725]
We propose a discretization-free multicalibration method over an ensemble of depth-two decision trees.<n>Our algorithm provably achieves multicalibration, provided that the data distribution satisfies a technical condition we term as loss saturation.
arXiv Detail & Related papers (2025-05-23T03:29:58Z) - R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge [78.26352952957909]
Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently.
The concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM.
In this paper, the problem of enabling edge users to collaboratively craft such MTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks.
arXiv Detail & Related papers (2024-11-27T10:57:06Z) - Analytic Continual Test-Time Adaptation for Multi-Modality Corruption [23.545997349882857]
Test-Time Adaptation (TTA) aims to help pre-trained models bridge the gap between source and target datasets.
We propose a novel approach, Multi-modality Dynamic Analytic Adapter (MDAA) for MM-CTTA tasks.
MDAA achieves state-of-the-art performance on MM-CTTA while ensuring reliable model adaptation.
arXiv Detail & Related papers (2024-10-29T01:21:24Z) - MOLA: Enhancing Industrial Process Monitoring Using Multi-Block Orthogonal Long Short-Term Memory Autoencoder [3.7028696448588487]
We introduce MOLA: a Multi-block Orthogonal Long short-term memory Autoencoder paradigm, to conduct accurate, reliable fault detection of industrial processes.
We propose a multi-block monitoring structure, which categorizes the process variables into multiple blocks by leveraging expert process knowledge.
We demonstrate the efficiency and effectiveness of our MOLA framework by applying it to the Tennessee Eastman Process.
arXiv Detail & Related papers (2024-10-10T00:49:43Z) - Byzantine-tolerant distributed learning of finite mixture models [16.60734923697257]
This paper introduces Distance Filtered Mixture Reduction (DFMR)
DFMR is a Byzantine tolerant adaptation of Mixture Reduction (MR) that is both computationally efficient and statistically sound.
We provide theoretical justification for DFMR, proving its optimal convergence rate and equivalence to the global maximum likelihood estimate.
arXiv Detail & Related papers (2024-07-19T02:11:26Z) - Variational Density Propagation Continual Learning [0.0]
Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data.
This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets.
arXiv Detail & Related papers (2023-08-22T21:51:39Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Threshold-Consistent Margin Loss for Open-World Deep Metric Learning [42.03620337000911]
Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures.
Inconsistency often complicates the threshold selection process when deploying commercial image retrieval systems.
We propose a novel variance-based metric called Operating-Point-Inconsistency-Score (OPIS) that quantifies the variance in the operating characteristics across classes.
arXiv Detail & Related papers (2023-07-08T21:16:41Z) - Training Normalizing Flows with the Precision-Recall Divergence [73.92251251511199]
We show that achieving a specified precision-recall trade-off corresponds to minimising -divergences from a family we call the em PR-divergences
We propose a novel generative model that is able to train a normalizing flow to minimise any -divergence, and in particular, achieve a given precision-recall trade-off.
arXiv Detail & Related papers (2023-02-01T17:46:47Z) - Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting [61.02295959343446]
This work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules.
We build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation.
We apply the proposed framework to current SOTA multi-agent trajectory forecasting systems as a plugin module.
arXiv Detail & Related papers (2022-07-11T21:17:41Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Multivariate Density Estimation with Deep Neural Mixture Models [0.0]
Deep neural networks (DNNs) have seldom been applied to density estimation.
This paper extends our previous work on Neural Mixture Densities (NMMs)
A maximum-likelihood (ML) algorithm for estimating Deep NMMs (DNMMs) is handed out.
The class of probability density functions that can be modeled to any degree of precision via DNMMs is formally defined.
arXiv Detail & Related papers (2020-12-06T23:03:48Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.