Linguistic laws in biology
- URL: http://arxiv.org/abs/2310.07387v1
- Date: Wed, 11 Oct 2023 11:08:20 GMT
- Title: Linguistic laws in biology
- Authors: Stuart Semple, Ramon Ferrer-i-Cancho and Morgan L. Gustison
- Abstract summary: Linguistic laws have been investigated by quantitative linguists for nearly a century.
Biologists from a range of disciplines have started to explore the prevalence of these laws beyond language.
We propose a new conceptual framework for the study of linguistic laws in biology.
- Score: 0.13812010983144798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linguistic laws, the common statistical patterns of human language, have been
investigated by quantitative linguists for nearly a century. Recently,
biologists from a range of disciplines have started to explore the prevalence
of these laws beyond language, finding patterns consistent with linguistic laws
across multiple levels of biological organisation, from molecular (genomes,
genes, and proteins) to organismal (animal behaviour) to ecological
(populations and ecosystems). We propose a new conceptual framework for the
study of linguistic laws in biology, comprising and integrating distinct levels
of analysis, from description to prediction to theory building. Adopting this
framework will provide critical new insights into the fundamental rules of
organisation underpinning natural systems, unifying linguistic laws and core
theory in biology.
Related papers
- From Sentences to Sequences: Rethinking Languages in Biological System [6.304152224988003]
We revisit the notion of language in biological systems to better understand how NLP successes can be effectively translated to biological domains.<n>By treating the 3D structure of biomolecules as the semantic content of a sentence, we highlight the importance of structural evaluation.
arXiv Detail & Related papers (2025-07-01T16:57:39Z) - Looking forward: Linguistic theory and methods [0.7673339435080445]
Major themes shaping contemporary linguistics are explicit testing of hypotheses about symbolic representation, the impact of artificial neural networks, and the importance of intersubjectivity in linguistic theory.
By connecting linguistics with computer science, psychology, neuroscience, and biology, we provide a forward-looking perspective on the changing landscape of linguistic research.
arXiv Detail & Related papers (2025-02-25T16:03:15Z) - BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning [49.487327661584686]
We introduce BioMaze, a dataset with 5.1K complex pathway problems from real research.
Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning.
To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation.
arXiv Detail & Related papers (2025-02-23T17:38:10Z) - Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models [6.364723262453785]
This paper harnesses the capabilities of large language models (LLMs) to mine key ecological entities from invasion biology literature.
Specifically, we focus on extracting species names, their locations, associated habitats, and ecosystems, information that is critical for understanding species spread.
This study lays the groundwork for more advanced, automated knowledge extraction tools that can aid researchers and practitioners in understanding and managing biological invasions.
arXiv Detail & Related papers (2025-01-30T11:55:44Z) - Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset.
This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks.
We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Language Evolution with Deep Learning [49.879239655532324]
Computational modeling plays an essential role in the study of language emergence.
It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language.
This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models.
arXiv Detail & Related papers (2024-03-18T16:52:54Z) - Universal Syntactic Structures: Modeling Syntax for Various Natural
Languages [0.0]
We aim to provide an explanation for how the human brain might connect words for sentence formation.
A novel approach to modeling syntactic representation is introduced, potentially showing the existence of universal syntactic structures for all natural languages.
arXiv Detail & Related papers (2023-12-28T20:44:26Z) - BioT5: Enriching Cross-modal Integration in Biology with Chemical
Knowledge and Natural Language Associations [54.97423244799579]
$mathbfBioT5$ is a pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations.
$mathbfBioT5$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information.
arXiv Detail & Related papers (2023-10-11T07:57:08Z) - Morphological Computing as Logic Underlying Cognition in Human, Animal,
and Intelligent Machine [1.14219428942199]
The work presents a scheme that connects logic, mathematics, physics, chemistry, biology, and cognition.
The inherent logic of agency exists in natural processes at various levels under information exchanges.
arXiv Detail & Related papers (2023-09-25T09:31:25Z) - ImmunoLingo: Linguistics-based formalization of the antibody language [0.5412332666265471]
Apparent parallels between natural language and biological sequence have led to a surge in the application of deep language models (LMs)
A lack of a rigorous linguistic formalization of biological sequence languages has led to largely domain-unspecific applications of LMs.
A linguistic formalization establishes linguistically-informed and thus domain-adapted components for LM applications.
arXiv Detail & Related papers (2022-09-26T12:33:14Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - Mechanism of Evolution Shared by Gene and Language [8.882751635947027]
We propose a general mechanism for evolution to explain the diversity of gene and language.
We find that the classical correspondence, "domain plays the role of word in gene language", is not rigorous.
We devise a new evolution unit, syllgram, to include the characteristics of spoken and written language.
arXiv Detail & Related papers (2020-12-28T15:46:19Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.