Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation
- URL: http://arxiv.org/abs/2510.03598v1
- Date: Sat, 04 Oct 2025 01:22:41 GMT
- Title: Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation
- Authors: Alexander V. Mantzaris,
- Abstract summary: It is evaluated on MNIST, CIFAR-10, and CIFAR-100 under a deliberately raw regime.<n>It is concluded that, for small-resolution image classification without augmentation, HRM is not competitive with even simple convolutional architectures.
- Score: 51.56484100374058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper asks whether the Hierarchical Reasoning Model (HRM) with the two Transformer-style modules $(f_L,f_H)$, one step (DEQ-style) training, deep supervision, Rotary Position Embeddings, and RMSNorm can serve as a practical image classifier. It is evaluated on MNIST, CIFAR-10, and CIFAR-100 under a deliberately raw regime: no data augmentation, identical optimizer family with one-epoch warmup then cosine-floor decay, and label smoothing. HRM optimizes stably and performs well on MNIST ($\approx 98\%$ test accuracy), but on small natural images it overfits and generalizes poorly: on CIFAR-10, HRM reaches 65.0\% after 25 epochs, whereas a two-stage Conv--BN--ReLU baseline attains 77.2\% while training $\sim 30\times$ faster per epoch; on CIFAR-100, HRM achieves only 29.7\% test accuracy despite 91.5\% train accuracy, while the same CNN reaches 45.3\% test with 50.5\% train accuracy. Loss traces and error analyses indicate healthy optimization but insufficient image-specific inductive bias for HRM in this regime. It is concluded that, for small-resolution image classification without augmentation, HRM is not competitive with even simple convolutional architectures as the HRM currently exist but this does not exclude possibilities that modifications to the model may allow it to improve greatly.
Related papers
- From Global to Granular: Revealing IQA Model Performance via Correlation Surface [83.65597122328133]
We present textbfGranularity-Modulated Correlation (GMC), which provides a structured, fine-grained analysis of IQA performance.<n>GMC includes a textbfDistribution Regulator that regularizes correlations to mitigate biases from non-uniform quality distributions.<n>Experiments on standard benchmarks show that GMC reveals performance characteristics invisible to scalar metrics, offering a more informative and reliable paradigm for analyzing, comparing, and deploying IQA models.
arXiv Detail & Related papers (2026-01-29T13:55:26Z) - DAFM: Dynamic Adaptive Fusion for Multi-Model Collaboration in Composed Image Retrieval [2.330678113289435]
We propose Dynamic Adaptive Fusion (DAFM) for multi-model collaboration in Composed Image Retrieval (CIR)<n>Our method achieves a Recall@10 of 93.21 and an Rmean of 84.43 on CIRR, and an average Rmean of 67.48 on FashionIQ, surpassing recent strong baselines by up to 4.5%.
arXiv Detail & Related papers (2025-11-07T06:51:10Z) - End-to-End Implicit Neural Representations for Classification [57.55927378696826]
Implicit neural representations (INRs) encode a signal in neural network parameters and show excellent results for signal reconstruction.<n>INR-based classification still significantly under-performs compared to pixel-based methods like CNNs.<n>This work presents an end-to-end strategy for initializing SIRENs together with a learned learning-rate scheme.
arXiv Detail & Related papers (2025-03-23T16:02:23Z) - Improving the U-Net Configuration for Automated Delineation of Head and Neck Cancer on MRI [0.0]
Tumor volume segmentation on MRI is a challenging and time-consuming process.<n>This work presents an approach to automated delineation of head and neck tumors on MRI scans.<n>The focus of this research was to propose improvements to the configuration commonly used in medical segmentation tasks.
arXiv Detail & Related papers (2025-01-09T10:22:35Z) - RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation [46.659592045271125]
RTMO is a one-stage pose estimation framework that seamlessly integrates coordinate classification.
It achieves accuracy comparable to top-down methods while maintaining high speed.
Our largest model, RTMO-l, attains 74.8% AP on COCO val 2017 and 141 FPS on a single V100 GPU.
arXiv Detail & Related papers (2023-12-12T18:55:29Z) - QuickQual: Lightweight, convenient retinal image quality scoring with
off-the-shelf pretrained models [2.9005223064604078]
Image quality remains a key problem for both traditional and deep learning (DL)-based approaches to retinal image analysis.
We present a simple approach to RIQS, consisting of a single off-the-shelf ImageNet-pretrained Densenet121 backbone plus a Support Vector Machine (SVM)
QuickQual performs very well, setting a new state-of-the-art for EyeQ.
We present a second model, QuickQual MEga Minified Estimator (QuickQual-MEME), that consists of only 10 parameters on top of an off-the-shelf Densenet121.
arXiv Detail & Related papers (2023-07-25T16:55:13Z) - CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way.
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning.
CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z) - MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning [12.365801596593936]
We model our pre-training task as a binary classification problem to induce an implicit contrastive effect.<n>Unlike existing methods, the proposed loss function optimize the mutual information in positive and negative pairs.<n>The proposed method outperforms SOTA self-supervised contrastive frameworks on benchmark datasets.
arXiv Detail & Related papers (2021-11-24T17:51:29Z) - Neural Architecture Search using Covariance Matrix Adaptation Evolution
Strategy [6.8129169853808795]
We propose a framework of applying the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to the neural architecture search problem called CMANAS.
The architecture are modelled using a normal distribution, which is updated using CMA-ES based on the fitness of the sampled population.
CMANAS finished the architecture search on CIFAR-10 with the top-1 test accuracy of 97.44% in 0.45 GPU day and on CIFAR-100 with the top-1 test accuracy of 83.24% for 0.6 GPU day on a single GPU.
arXiv Detail & Related papers (2021-07-15T11:41:23Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - Tent: Fully Test-time Adaptation by Entropy Minimization [77.85911673550851]
A model must adapt itself to generalize to new and different data during testing.
In this setting of fully test-time adaptation the model has only the test data and its own parameters.
We propose to adapt by test entropy minimization (tent): we optimize the model for confidence as measured by the entropy of its predictions.
arXiv Detail & Related papers (2020-06-18T17:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.