Symbolically Regressing Fish Biomass Spectral Data: A Linear Genetic Programming Method with Tunable Primitives
- URL: http://arxiv.org/abs/2505.21901v1
- Date: Wed, 28 May 2025 02:27:49 GMT
- Title: Symbolically Regressing Fish Biomass Spectral Data: A Linear Genetic Programming Method with Tunable Primitives
- Authors: Zhixing Huang, Bing Xue, Mengjie Zhang, Jeremy S. Ronney, Keith C. Gordon, Daniel P. Killeen,
- Abstract summary: This paper models fish biomass spectral data as a symbolic regression problem and solves it by a linear genetic programming method.<n>In the symbolic regression problem, linear genetic programming automatically synthesizes regression models based on the given primitives and training data.<n>Our empirical results over ten fish biomass targets show that the proposed method improves the overall performance of fish biomass composition prediction.
- Score: 5.163542749660303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning techniques play an important role in analyzing spectral data. The spectral data of fish biomass is useful in fish production, as it carries many important chemistry properties of fish meat. However, it is challenging for existing machine learning techniques to comprehensively discover hidden patterns from fish biomass spectral data since the spectral data often have a lot of noises while the training data are quite limited. To better analyze fish biomass spectral data, this paper models it as a symbolic regression problem and solves it by a linear genetic programming method with newly proposed tunable primitives. In the symbolic regression problem, linear genetic programming automatically synthesizes regression models based on the given primitives and training data. The tunable primitives further improve the approximation ability of the regression models by tuning their inherent coefficients. Our empirical results over ten fish biomass targets show that the proposed method improves the overall performance of fish biomass composition prediction. The synthesized regression models are compact and have good interpretability, which allow us to highlight useful features over the spectrum. Our further investigation also verifies the good generality of the proposed method across various spectral data treatments and other symbolic regression problems.
Related papers
- Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis [7.075575292983362]
This paper proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield.
We are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset.
arXiv Detail & Related papers (2024-09-29T12:28:19Z) - Discovering physical laws with parallel combinatorial tree search [57.05912962368898]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.<n>Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade.<n>We introduce a parallel tree search (PCTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding [6.332060647845203]
We introduce FishPhenoKey, a comprehensive dataset comprising 23,331 high-resolution images spanning six fish species.
FishPhenoKey includes 22 phenotype-oriented annotations, enabling the capture of intricate morphological phenotypes.
We also propose a new evaluation metric, Percentage of Measured Phenotype.
arXiv Detail & Related papers (2024-05-21T03:36:13Z) - Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL [1.840390797252648]
Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations.
We propose eGRAL, a novel graph neural network architecture designed for predicting binding affinity changes from amino acid substitutions in protein complexes.
eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models.
arXiv Detail & Related papers (2024-05-03T10:33:19Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics [44.97217246897902]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Supervised Learning and Model Analysis with Compositional Data [4.082799056366927]
KernelBiome is a kernel-based non-parametric regression and classification framework for compositional data.
We demonstrate on par or improved performance compared with state-of-the-art machine learning methods.
arXiv Detail & Related papers (2022-05-15T12:33:43Z) - Analytical Modelling of Exoplanet Transit Specroscopy with Dimensional
Analysis and Symbolic Regression [68.8204255655161]
The deep learning revolution has opened the door for deriving such analytical results directly with a computer algorithm fitting to the data.
We successfully demonstrate the use of symbolic regression on synthetic data for the transit radii of generic hot Jupiter exoplanets.
As a preprocessing step, we use dimensional analysis to identify the relevant dimensionless combinations of variables.
arXiv Detail & Related papers (2021-12-22T00:52:56Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - Compact representation of temporal processes in echosounder time series
via matrix decomposition [0.7614628596146599]
We develop a methodology that builds compact representation of long-term echosounder time series using intrinsic features in the data.
This work forms the basis for constructing robust time series analytics for large-scale, acoustics-based biological observation in the ocean.
arXiv Detail & Related papers (2020-07-06T17:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.