SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
- URL: http://arxiv.org/abs/2506.06176v1
- Date: Fri, 06 Jun 2025 15:39:54 GMT
- Title: SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
- Authors: Zhenyu Yu, Mohd. Yamani Idna Idris, Pei Wang, Yuelong Xia, Fei Ma, Rizwan Qureshi,
- Abstract summary: We propose a novel symbolic regression framework that derives physically interpretable expressions directly from remote sensing imagery.<n>SatelliteFormula combines a Vision Transformer-based encoder for spatial-spectral feature extraction with physics-guided constraints to ensure consistency and interpretability.
- Score: 8.965479246496878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SatelliteFormula, a novel symbolic regression framework that derives physically interpretable expressions directly from multi-spectral remote sensing imagery. Unlike traditional empirical indices or black-box learning models, SatelliteFormula combines a Vision Transformer-based encoder for spatial-spectral feature extraction with physics-guided constraints to ensure consistency and interpretability. Existing symbolic regression methods struggle with the high-dimensional complexity of multi-spectral data; our method addresses this by integrating transformer representations into a symbolic optimizer that balances accuracy and physical plausibility. Extensive experiments on benchmark datasets and remote sensing tasks demonstrate superior performance, stability, and generalization compared to state-of-the-art baselines. SatelliteFormula enables interpretable modeling of complex environmental variables, bridging the gap between data-driven learning and physical understanding.
Related papers
- D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping [66.22412592525369]
We introduce a real-to-sim-to-real engine that leverages the Gaussian Splat representations to build a differentiable engine.<n>We show that our engine achieves accurate and robust performance in mass identification across various object geometries and mass values.<n>Those optimized mass values facilitate force-aware policy learning, achieving superior and high performance in object grasping.
arXiv Detail & Related papers (2026-03-01T15:32:04Z) - Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces [34.75476728721597]
We introduce the first over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIM)<n>SIMs can reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR)<n> Experiments with heterogeneous vision transformer (ViT) encoders show that SIMs can accurately reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR)
arXiv Detail & Related papers (2025-12-05T12:05:31Z) - Physically Interpretable Representation Learning with Gaussian Mixture Variational AutoEncoder (GM-VAE) [37.18249990338269]
We propose a Variational Autoencoder (GM-VAE) framework designed to extract, physically interpretable representations from high-dimensional scientific data.<n>Unlike conventional VAEs that jointly optimize reconstruction and clustering, our method utilizes a block-coordinate descent strategy.<n>To objectively evaluate the learned representations, we introduce a metric based on graph-Laplacian smoothness, which measures the coherence of physical instability across the latent manifold.
arXiv Detail & Related papers (2025-11-26T20:04:38Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors [25.67875816218477]
Full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range.<n>Previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external visual sensors to obtain global positions of key joints.<n>To improve the practicality of the technology for virtual reality applications, we estimate full-body poses using only inertial data obtained from three Inertial Measurement Unit (IMU) sensors worn on the head and wrists.
arXiv Detail & Related papers (2025-05-08T15:28:09Z) - SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion [4.824120664293887]
We introduce SatelliteCalculator, the first vision foundation model for quantitative remote sensing inversion.<n>By leveraging physically defined index adapters, we automatically construct a large-scale dataset of over one million paired samples.<n> Experiments demonstrate that SatelliteCalculator achieves competitive accuracy across all tasks while significantly reducing inference cost.
arXiv Detail & Related papers (2025-04-18T03:48:04Z) - UrbanSAM: Learning Invariance-Inspired Adapters for Segment Anything Models in Urban Construction [51.54946346023673]
Urban morphology is inherently complex, with irregular objects of diverse shapes and varying scales.<n>The Segment Anything Model (SAM) has shown significant potential in segmenting complex scenes.<n>We propose UrbanSAM, a customized version of SAM specifically designed to analyze complex urban environments.
arXiv Detail & Related papers (2025-02-21T04:25:19Z) - Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks [93.38375271826202]
We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks.
We first build a simulator by integrating Gaussian splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks.
In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, programming of expert demonstration training data, and the task understanding capabilities of Liquid networks.
arXiv Detail & Related papers (2024-06-21T13:48:37Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Unsupervised Discovery of Semantic Concepts in Satellite Imagery with
Style-based Wavelet-driven Generative Models [27.62417543307831]
We present the first pre-trained style- and wavelet-based GAN model that can synthesize a wide gamut of realistic satellite images.
We show that by analyzing the intermediate activations of our network, one can discover a multitude of interpretable semantic directions.
arXiv Detail & Related papers (2022-08-03T14:19:24Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image
Restoration [36.525810477650026]
Hyperspectral imaging offers new perspectives for diverse applications.
The lack of accurate ground-truth "clean" hyperspectral signals on the spot makes restoration tasks challenging.
In this paper, we advocate for a hybrid approach based on sparse coding principles.
arXiv Detail & Related papers (2021-11-18T14:16:04Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z) - Learning the sense of touch in simulation: a sim-to-real strategy for
vision-based tactile sensing [1.9981375888949469]
This paper focuses on a vision-based tactile sensor, which aims to reconstruct the distribution of the three-dimensional contact forces applied on its soft surface.
A strategy is proposed to train a tailored deep neural network entirely from the simulation data.
The resulting learning architecture is directly transferable across multiple tactile sensors without further training and yields accurate predictions on real data.
arXiv Detail & Related papers (2020-03-05T14:17:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.