Related papers: DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models

DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models

URL: http://arxiv.org/abs/2511.21982v1
Date: Wed, 26 Nov 2025 23:44:58 GMT
Title: DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models
Authors: Futian Wang, Chaoliu Weng, Xiao Wang, Zhen Chen, Zhicheng Zhao, Jin Tang,
Abstract summary: This paper presents a new large-scale benchmark dataset for dial reading, termed RPM-10K.<n>Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM.<n>Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings.
Score: 16.519805386469944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic viewing angles, and overly between thin pointers and scale markings. Up to now, this area still lacks large-scale datasets to support the development of robust algorithms. To address these challenges, this paper first presents a new large-scale benchmark dataset for dial reading, termed RPM-10K, which contains 10730 meter images that fully reflect the aforementioned key challenges. Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM, based on physical relation injection. Instead of exhaustively learning image-level correlations, MRLM explicitly encodes the geometric and causal relationships between the pointer and the scale, aligning perception with physical reasoning in the spirit of world-model perspectives. Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings. Extensive experiments fully validated the effectiveness of our proposed framework on the newly proposed benchmark dataset. Both the dataset and source code will be released on https://github.com/Event-AHU/DialBench

Related papers

EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition [54.55914886780534]
Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventional visible-light cameras under challenging conditions such as low illumination, overexposure, and high-speed motion.<n>We introduce EPRBench, a high-quality benchmark specifically designed for event stream-based VPR.<n>EPRBench comprises 10K event sequences and 65K event frames, collected using both handheld and vehicle-mounted setups to comprehensively capture real-world challenges across diverse viewpoints, weather conditions, and lighting scenarios.
arXiv Detail & Related papers (2026-02-13T13:25:05Z)
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench [4.095423692230828]
Measure-Bench is a benchmark on visual measurement reading covering both real-world and synthesized images of various types of measurements.<n>Our pipeline procedurally generates a specified type of gauge with controllable visual appearance.
arXiv Detail & Related papers (2025-10-30T17:20:51Z)
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding [50.72259772580637]
We introduce OST-Bench, a benchmark designed to evaluate Online Spatio-Temporal understanding from the perspective of an agent.<n>Built on an efficient data collection pipeline, OST-Bench consists of 1.4k scenes and 10k question-answer pairs collected from ScanNet, Matterport3D, and ARKitScenes.<n>We find that both complex-based spatial reasoning demands and long-term memory retrieval requirements significantly drop model performance along two separate axes.
arXiv Detail & Related papers (2025-07-10T17:56:07Z)
Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models [7.898092154590899]
Salient Object Detection aims to identify and segment prominent regions within a scene.<n>Traditional models rely on manually annotated pseudo labels with precise pixel-level accuracy.<n>We develop a low-cost, high-precision annotation method to address the challenges.
arXiv Detail & Related papers (2025-01-08T15:56:21Z)
Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study.<n>Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets.<n>We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z)
Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness [6.229124658686219]
We develop a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors. To calculate token cohesiveness, we use a few rounds of random token deletion and semantic difference measurement. Experiments with four state-of-the-art base detectors on various datasets, source models, and evaluation settings demonstrate the effectiveness and generality of the proposed approach.
arXiv Detail & Related papers (2024-09-25T13:18:57Z)
Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection [1.0334138809056097]
We propose a novel framework for real-time LiDAR odometry and mapping based on LOAM architecture for fast moving platforms. Our framework utilizes semantic information produced by a deep learning model to improve point-to-line and point-to-plane matching. We study the effect of improving the matching process on the robustness of LiDAR odometry against high speed motion.
arXiv Detail & Related papers (2024-03-05T16:53:24Z)
Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models. Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z)
Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification. We propose new techniques to push its frontier in two aspects. Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z)
Attention improves concentration when learning node embeddings [1.2233362977312945]
Given nodes labelled with search query text, we want to predict links to related queries that share products. Experiments with a range of deep neural architectures show that simple feedforward networks with an attention mechanism perform best for learning embeddings. We propose an analytically tractable model of query generation, AttEST, that views both products and the query text as vectors embedded in a latent space.
arXiv Detail & Related papers (2020-06-11T21:21:12Z)
FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks. We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.