How Shift Equivariance Impacts Metric Learning for Instance Segmentation
- URL: http://arxiv.org/abs/2101.05846v1
- Date: Thu, 14 Jan 2021 19:48:24 GMT
- Title: How Shift Equivariance Impacts Metric Learning for Instance Segmentation
- Authors: Josef Lorenz Rumberger, Xiaoyan Yu, Peter Hirsch, Melanie Dohmen,
Vanessa Emanuela Guarino, Ashkan Mokarian, Lisa Mais, Jan Funke, Dagmar
Kainmueller
- Abstract summary: We show that a standard encoder-decoder network has the capacity to distinguish at most $fdl$ same-looking objects.
We also show that to avoid discontinuities in a tile-and-stitch approach, it is necessary to employ valid convolutions in combination with a training output window size strictly greater than $fl$.
- Score: 11.981698445848748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metric learning has received conflicting assessments concerning its
suitability for solving instance segmentation tasks. It has been dismissed as
theoretically flawed due to the shift equivariance of the employed CNNs and
their respective inability to distinguish same-looking objects. Yet it has been
shown to yield state of the art results for a variety of tasks, and practical
issues have mainly been reported in the context of tile-and-stitch approaches,
where discontinuities at tile boundaries have been observed. To date, neither
of the reported issues have undergone thorough formal analysis. In our work, we
contribute a comprehensive formal analysis of the shift equivariance properties
of encoder-decoder-style CNNs, which yields a clear picture of what can and
cannot be achieved with metric learning in the face of same-looking objects. In
particular, we prove that a standard encoder-decoder network that takes
$d$-dimensional images as input, with $l$ pooling layers and pooling factor
$f$, has the capacity to distinguish at most $f^{dl}$ same-looking objects, and
we show that this upper limit can be reached. Furthermore, we show that to
avoid discontinuities in a tile-and-stitch approach, assuming standard batch
size 1, it is necessary to employ valid convolutions in combination with a
training output window size strictly greater than $f^l$, while at test-time it
is necessary to crop tiles to size $n\cdot f^l$ before stitching, with $n\geq
1$. We complement these theoretical findings by discussing a number of
insightful special cases for which we show empirical results on synthetic data.
Related papers
- Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning [27.01879432423409]
Recent large language models often misapply lemmas, importing conclusions without validating assumptions.<n>We present RULES, which encodes this specification via a two$-$section output and trains with reinforcement learning.<n>Training and evaluation draw on diverse natural language and formal proof corpora.
arXiv Detail & Related papers (2026-02-01T03:34:30Z) - Test time training enhances in-context learning of nonlinear functions [51.56484100374058]
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
arXiv Detail & Related papers (2025-09-30T03:56:44Z) - Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation [45.395353088233556]
We introduce a theoretically grounded $textbfDi$ffusion-based $textbfC$ounterf$textbfa$ctual $textbfp$rompt learning framework.<n>Our method performs excellently across tasks such as image classification, image-text retrieval, and visual question answering, with particularly strong advantages in unseen categories.
arXiv Detail & Related papers (2025-07-26T09:27:52Z) - Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation [52.239136918460616]
Extending binary class tags to approximate relative object-size distributions allows off-the-shelf architectures to solve the segmentation problem.
A straightforward zero-avoiding KL-divergence loss for average predictions produces segmentation accuracy comparable to the standard pixel-precise supervision.
Our ideas are validated on PASCAL VOC using our new human annotations of approximate object sizes.
arXiv Detail & Related papers (2025-03-10T06:02:13Z) - Mitigating covariate shift in non-colocated data with learned parameter priors [0.0]
We present textitFragmentation-induced co-shift remediation ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline.
We run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths.
The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5%$ and $10%$, respectively.
arXiv Detail & Related papers (2024-11-10T15:48:29Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.
We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.
We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation [1.124958340749622]
We introduce a multitask learning framework to leverage correlations among the counting, detection, and segmentation tasks.
We develop a cross-position cut-and-paste for label augmentation and an entropy-based pseudo-label selection.
The proposed model is capable of significantly outperforming UDA methods and produces comparable performance as the supervised counterpart.
arXiv Detail & Related papers (2024-03-31T12:22:23Z) - Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images [76.47980643420375]
This paper builds on the hypothesis that there is an inherent data-hungry matter in learning semantic correspondences.
We demonstrate a simple machine annotator reliably enriches paired key points via machine supervision.
Our models surpass current state-of-the-art models on semantic correspondence learning benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW.
arXiv Detail & Related papers (2023-11-30T13:22:15Z) - Causal Transportability for Visual Recognition [70.13627281087325]
We show that standard classifiers fail because the association between images and labels is not transportable across settings.
We then show that the causal effect, which severs all sources of confounding, remains invariant across domains.
This motivates us to develop an algorithm to estimate the causal effect for image classification.
arXiv Detail & Related papers (2022-04-26T15:02:11Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - Measuring Model Fairness under Noisy Covariates: A Theoretical
Perspective [26.704446184314506]
We study the problem of measuring the fairness of a machine learning model under noisy information.
We present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible.
arXiv Detail & Related papers (2021-05-20T18:36:28Z) - Adversarial Robustness of Supervised Sparse Coding [34.94566482399662]
We consider a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate.
We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear encoder.
We provide a robustness certificate for end-to-end classification.
arXiv Detail & Related papers (2020-10-22T22:05:21Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.