Related papers: Learning material synthesis-process-structure-property relationship by data fusion: Bayesian Coregionalization N-Dimensional Piecewise Function Learning

Related papers

Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models [8.299006259255572]
We propose Synthetic-on-Graph (SoG), a synthetic data generation framework that incorporates cross-document knowledge associations for efficient corpus expansion.<n>SoG constructs a context graph by extracting entities and concepts from the original corpus, representing cross-document associations.<n>To further improve synthetic data quality, we integrate Chain-of-Thought (CoT) and Contrastive Clarifying (CC) synthetic, enhancing reasoning processes and discriminative power.
arXiv Detail & Related papers (2025-05-02T03:40:39Z)
Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery. Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs) By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z)
A Materials Map Integrating Experimental and Computational Data via Graph-Based Machine Learning for Enhanced Materials Discovery [5.06756291053173]
Materials informatics (MI) is expected to significantly accelerate material development and discovery. Data used in MI are derived from both computational and experimental studies. In this study, we use the obtained datasets to construct materials maps, which visualize the relationships between material properties and structural features.
arXiv Detail & Related papers (2025-03-10T14:31:34Z)
Equation discovery framework EPDE: Towards a better equation discovery [50.79602839359522]
We enhance the EPDE algorithm -- an evolutionary optimization-based discovery framework. Our approach generates terms using fundamental building blocks such as elementary functions and individual differentials. We validate our algorithm's noise resilience and overall performance by comparing its results with those from the state-of-the-art equation discovery framework SINDy.
arXiv Detail & Related papers (2024-12-28T15:58:44Z)
Understanding Synthetic Context Extension via Retrieval Heads [51.8869530817334]
We investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning. We find that models trained on synthetic data fall short of the real data, but surprisingly, the mismatch can be interpreted. Our results shed light on how to interpret synthetic data fine-tuning performance and how to approach creating better data for learning real-world capabilities over long contexts.
arXiv Detail & Related papers (2024-10-29T17:55:00Z)
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective [9.590540796223715]
We show that the generalization capability of the post-trained model is critically determined by the information gain derived from the generative model. We also introduce the concept of Generalization Gain via Mutual Information (GGMI) and elucidate the relationship between generalization gain and information gain. This analysis serves as a theoretical foundation for synthetic data generation and further highlights its connection with the generalization capability of post-trained models.
arXiv Detail & Related papers (2024-10-02T16:32:05Z)
Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models [3.877001015064152]
Probabilistic relational models provide a well-established formalism to combine first-order logic and probabilistic models. The field of artificial intelligence requires increasingly large amounts of relational training data for various machine learning tasks. Collecting real-world data is often challenging due to privacy concerns, data protection regulations, high costs, and so on.
arXiv Detail & Related papers (2024-09-06T11:24:25Z)
IRG: Generating Synthetic Relational Databases using Deep Learning with Insightful Relational Understanding [13.724085637262654]
We propose incremental generator (IRG) that successfully handles ubiquitous real-life situations. IRG ensures the preservation of relational schema integrity, offers a deep understanding of relationships beyond direct ancestors and descendants. Experiments on three open-source real-life relational datasets in different fields at different scales demonstrate IRG's advantage in maintaining the synthetic data's relational schema validity and data fidelity and utility.
arXiv Detail & Related papers (2023-12-23T07:47:58Z)
Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection [82.94413676131545]
We propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection. KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space. It extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy.
arXiv Detail & Related papers (2023-06-28T06:08:20Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
Physics in the Machine: Integrating Physical Knowledge in Autonomous Phase-Mapping [10.629434761354938]
Science-informed AI or scientific AI has grown from a focus on data analysis to now controlling experiment design, simulation, execution and analysis in closed-loop autonomous systems. The CAMEO algorithm employs scientific AI to address two tasks: learning a material system's composition-structure relationship and identifying materials compositions with optimal functional properties.
arXiv Detail & Related papers (2021-11-15T00:48:34Z)
Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery [1.0036312061637764]
Machine learning (ML)-accelerated discovery requires large amounts of high-fidelity data to reveal predictive structure-property relationships. For many properties of interest in materials discovery, the challenging nature and high cost of data generation has resulted in a data landscape that is scarcely populated and of dubious quality. In the absence of manual curation, increasingly sophisticated natural language processing and automated image analysis are making it possible to learn structure-property relationships from the literature.
arXiv Detail & Related papers (2021-11-02T21:43:58Z)
Materials Representation and Transfer Learning for Multi-Property Prediction [22.068267502715404]
The adoption of machine learning in materials science has rapidly transformed materials property prediction. Hurdles limiting full capitalization of recent advancements in machine learning include the limited development of methods to learn the underlying interactions of multiple elements. We introduce the Hierarchical Correlation Learning for Multi-property Prediction framework that seamlessly integrates (i) prediction using only a material's composition, (ii) learning and exploitation of correlations among target properties in multi-target regression, and (iii) leveraging training data from tangential domains via generative transfer learning.
arXiv Detail & Related papers (2021-06-04T03:00:34Z)
Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts. We learn relation prototypes as an implicit factor between entities. We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z)
A Dependency Syntactic Knowledge Augmented Interactive Architecture for End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA. This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn) Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z)
Multilinear Compressive Learning with Prior Knowledge [106.12874293597754]
Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system. Key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task. In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable?
arXiv Detail & Related papers (2020-02-17T19:06:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.