Criteria-first, semantics-later: reproducible structure discovery in image-based sciences
- URL: http://arxiv.org/abs/2602.15712v1
- Date: Tue, 17 Feb 2026 16:45:49 GMT
- Title: Criteria-first, semantics-later: reproducible structure discovery in image-based sciences
- Authors: Jan Bumberger,
- Abstract summary: Structure is recovered by predicting or enforcing domain-specific labels.<n>This paradigm fails systematically under the conditions that make image-based science most valuable.<n>A unified framework for criteria-first structure discovery is introduced.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Across the natural and life sciences, images have become a primary measurement modality, yet the dominant analytic paradigm remains semantics-first. Structure is recovered by predicting or enforcing domain-specific labels. This paradigm fails systematically under the conditions that make image-based science most valuable, including open-ended scientific discovery, cross-sensor and cross-site comparability, and long-term monitoring in which domain ontologies and associated label sets drift culturally, institutionally, and ecologically. A deductive inversion is proposed in the form of criteria-first and semantics-later. A unified framework for criteria-first structure discovery is introduced. It separates criterion-defined, semantics-free structure extraction from downstream semantic mapping into domain ontologies or vocabularies and provides a domain-general scaffold for reproducible analysis across image-based sciences. Reproducible science requires that the first analytic layer perform criterion-driven, semantics-free structure discovery, yielding stable partitions, structural fields, or hierarchies defined by explicit optimality criteria rather than local domain ontologies. Semantics is not discarded; it is relocated downstream as an explicit mapping from the discovered structural product to a domain ontology or vocabulary, enabling plural interpretations and explicit crosswalks without rewriting upstream extraction. Grounded in cybernetics, observation-as-distinction, and information theory's separation of information from meaning, the argument is supported by cross-domain evidence showing that criteria-first components recur whenever labels do not scale. Finally, consequences are outlined for validation beyond class accuracy and for treating structural products as FAIR, AI-ready digital objects for long-term monitoring and digital twins.
Related papers
- Visual Language Hypothesis [14.062822951292402]
We study visual representation learning from a structural and topological perspective.<n>We show that approximating the quotient also places structural demands on the model architecture.
arXiv Detail & Related papers (2025-12-29T09:43:10Z) - What is Stochastic Supervenience? [0.0]
This paper develops a general framework for supervenience.<n>I formulates that secure law like fixation, noneneracy, and directional asymmetry.<n>I show that classical deterministic supervenience appears as a limiting Dirac case.
arXiv Detail & Related papers (2025-12-20T07:47:53Z) - One Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces [17.173074024116477]
Embedding spaces are fundamental to modern AI, translating raw data into high-dimensional vectors that encode rich semantic relationships.<n>We introduce the Semantic Field Subspace (SFS), a geometry-preserving, context-aware representation that captures local semantic neighborhoods within the embedding space.<n>We also propose SAFARI, an unsupervised, modality-agnostic algorithm that uncovers hierarchical semantic structures using a novel metric called Semantic Shift.
arXiv Detail & Related papers (2025-11-30T11:48:00Z) - Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders [11.190791003373322]
We ask whether sparse autoencoders can enable open-ended feature discovery from foundation model representations.<n>Applying to ecological imagery, the same procedure surfaces fine-grained anatomical structure without access to segmentation or part labels.<n>Our results indicate that sparse decomposition provides a practical instrument for exploring what scientific foundation models have learned.
arXiv Detail & Related papers (2025-11-21T19:38:07Z) - Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious.
The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving.
The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.