Related papers: FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

URL: http://arxiv.org/abs/2502.18452v1
Date: Tue, 25 Feb 2025 18:51:06 GMT
Title: FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response
Authors: Mollie Shichman, Claire Bonial, Austin Blodgett, Taylor Hudson, Francis Ferraro, Rachel Rudinger,
Abstract summary: We introduce a pipeline to create Field Ready Instruction Decoding Agent (FRIDA) models.<n>We fine-tune several LLaMa and Mistral instruction-tuned models and find that FRIDA models outperform their base models at a variety of sizes.<n>We conclude that the FRIDA pipeline is capable of instilling general common sense, but needs to be augmented with information retrieval for specific domain knowledge.
Score: 19.744969357182665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have the potential for substantial common sense reasoning. However, these capabilities are often emergent in larger models. This means smaller models that can be run locally are less helpful and capable with respect to certain reasoning tasks. To meet our problem space requirements, we fine-tune smaller LLMs to disaster domains, as these domains involve complex and low-frequency physical common sense knowledge. We introduce a pipeline to create Field Ready Instruction Decoding Agent (FRIDA) models, where domain experts and linguists combine their knowledge to make high-quality seed data that is used to generate synthetic data for fine-tuning. We create a set of 130 seed instructions for synthetic generation, a synthetic dataset of 25000 instructions, and 119 evaluation instructions relating to both general and earthquake-specific object affordances. We fine-tune several LLaMa and Mistral instruction-tuned models and find that FRIDA models outperform their base models at a variety of sizes. We then run an ablation study to understand which kinds of synthetic data most affect performance and find that training physical state and object function common sense knowledge alone improves over FRIDA models trained on all data. We conclude that the FRIDA pipeline is capable of instilling general common sense, but needs to be augmented with information retrieval for specific domain knowledge.

Related papers

Mock Worlds, Real Skills: Building Small Agentic Language Models with Synthetic Tasks, Simulated Environments, and Rubric-Based Rewards [13.784988950752195]
Existing open-source agentic training data are narrow in task variety and easily solved.<n>Real-world APIs lack diversity and are unstable for large-scale reinforcement learning rollout processes.<n>We address these challenges with SYNTHAGENT, a framework that jointly synthesizes diverse tool-use training data and simulates complete environments.
arXiv Detail & Related papers (2026-01-30T03:43:42Z)
Understanding the Influence of Synthetic Data for Text Embedders [52.04771455432998]
We first reproduce and publicly release the synthetic data proposed by Wang et al.<n>We critically examine where exactly synthetic data improves model generalization.<n>Our findings highlight the limitations of current synthetic data approaches for building general-purpose embedders.
arXiv Detail & Related papers (2025-09-07T19:28:52Z)
DSR-Bench: Evaluating the Structural Reasoning Abilities of LLMs via Data Structures [20.596558700597644]
Large language models (LLMs) are increasingly deployed for real-world tasks that fundamentally involve data manipulation.<n>A core requirement is the ability to perform structural reasoning--that is, to understand and reason about data relationships.<n>We introduce DSR-Bench, a novel benchmark evaluating LLMs' structural reasoning capabilities through data structures.
arXiv Detail & Related papers (2025-05-29T23:24:53Z)
The Comparability of Model Fusion to Measured Data in Confuser Rejection [0.24578723416255746]
No dataset can account for every slight deviation we might see in live usage.<n>Simulators have been developed utilizing the shooting and bouncing ray method to allow for the generation of synthetic SAR data on 3D models.<n>We aim to use computational power as a substitution for this lack of quality measured data, by ensembling many models trained on synthetic data.
arXiv Detail & Related papers (2025-05-01T19:51:30Z)
Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution [66.11004226578771]
Existing robust benchmark datasets have two key limitations.<n>They generate only a limited range of perturbations for a single Information Extraction (IE) task.<n>Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench.<n>We show that training with only textbf15% of the data leads to an average textbf7.5% relative performance improvement across three IE tasks.
arXiv Detail & Related papers (2025-03-05T05:39:29Z)
Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities. Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
We introduce AutoElicit to extract knowledge from large language models and construct priors for predictive models. We show these priors are informative and can be refined using natural language. We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.
arXiv Detail & Related papers (2024-11-26T10:13:39Z)
Understanding Synthetic Context Extension via Retrieval Heads [51.8869530817334]
We investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning.<n>We find that models trained on synthetic data fall short of the real data, but surprisingly, the mismatch can be interpreted.<n>Our results shed light on how to interpret synthetic data fine-tuning performance and how to approach creating better data for learning real-world capabilities over long contexts.
arXiv Detail & Related papers (2024-10-29T17:55:00Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
CoDi: Conversational Distillation for Grounded Question Answering [10.265241619616676]
We introduce a novel data distillation framework named CoDi. CoDi allows us to synthesize large-scale, assistant-style datasets in a steerable and diverse manner. We show that SLMs trained with CoDi-synthesized data achieve performance comparable to models trained on human-annotated data in standard metrics.
arXiv Detail & Related papers (2024-08-20T22:35:47Z)
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs) Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws. Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z)
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models [42.48862540545121]
We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models. EWOK targets specific concepts from multiple knowledge domains known to be vital for world modeling in humans. We then introduce EWOK-CORE-1.0, a dataset of 4,374 items covering 11 world knowledge domains.
arXiv Detail & Related papers (2024-05-15T17:19:42Z)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features.<n>This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks.<n>We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z)
Discovery of the Hidden World with Large Language Models [95.58823685009727]
This paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap. LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data. COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors.
arXiv Detail & Related papers (2024-02-06T12:18:54Z)
Enhancing Object Detection Performance for Small Objects through Synthetic Data Generation and Proportional Class-Balancing Technique: A Comparative Study in Industrial Scenarios [0.0]
Object Detection (OD) has proven to be a significant computer vision method in extracting localized class information. Many of the state-of-the-art (SOTA) OD models perform well on medium and large sized objects, but under perform on small objects. This study presents a novel approach that injects additional data points to improve the performance of the OD models.
arXiv Detail & Related papers (2024-01-23T13:02:11Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances. We design fine-grained step-by-step instructions to obtain the initial data instances. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task. We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z)
Maximizing Model Generalization for Machine Condition Monitoring with Self-Supervised Learning and Federated Learning [4.214064911004321]
Deep Learning can diagnose faults and assess machine health from raw condition monitoring data without manually designed statistical features. Traditional supervised learning may struggle to learn compact, discriminative representations that generalize to unseen target domains. This study proposes focusing on maximizing the feature generality on the source domain and applying TL via weight transfer to copy the model to the target domain.
arXiv Detail & Related papers (2023-04-27T17:57:54Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Evaluation of Categorical Generative Models -- Bridging the Gap Between Real and Synthetic Data [18.142397311464343]
We introduce an appropriately scalable evaluation method for generative models. We consider increasingly large probability spaces, which correspond to increasingly difficult modeling tasks. We validate our evaluation procedure with synthetic experiments on both synthetic generative models and current state-of-the-art categorical generative models.
arXiv Detail & Related papers (2022-10-28T21:05:25Z)
Transformer-Based Behavioral Representation Learning Enables Transfer Learning for Mobile Sensing in Small Datasets [4.276883061502341]
We provide a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series. This architecture combines benefits from CNN and Trans-former architectures to enable better prediction performance.
arXiv Detail & Related papers (2021-07-09T22:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.