Analytical Survey of Learning with Low-Resource Data: From Analysis to Investigation
- URL: http://arxiv.org/abs/2510.08962v1
- Date: Fri, 10 Oct 2025 03:15:42 GMT
- Title: Analytical Survey of Learning with Low-Resource Data: From Analysis to Investigation
- Authors: Xiaofeng Cao, Mingwei Xu, Xin Yu, Jiangchao Yao, Wei Ye, Shengjun Huang, Minling Zhang, Ivor W. Tsang, Yew Soon Ong, James T. Kwok, Heng Tao Shen,
- Abstract summary: Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI)<n>However, the costs associated with data annotation and model training remain significant.<n>This survey employs active sampling theory to analyze the generalization error and label complexity associated with learning from low-resource data.
- Score: 192.53529928861818
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI); however, the costs associated with data annotation and model training remain significant. A fundamental objective of AI research is to achieve robust generalization with limited-resource data. This survey employs agnostic active sampling theory within the Probably Approximately Correct (PAC) framework to analyze the generalization error and label complexity associated with learning from low-resource data in both model-agnostic supervised and unsupervised settings. Based on this analysis, we investigate a suite of optimization strategies tailored for low-resource data learning, including gradient-informed optimization, meta-iteration optimization, geometry-aware optimization, and LLMs-powered optimization. Furthermore, we provide a comprehensive overview of multiple learning paradigms that can benefit from low-resource data, including domain transfer, reinforcement feedback, and hierarchical structure modeling. Finally, we conclude our analysis and investigation by summarizing the key findings and highlighting their implications for learning with low-resource data.
Related papers
- Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity [59.27594125465172]
We introduce Data Reasoning Intensity (DRI), a novel metric that quantifies the latent logical reasoning complexity of samples.<n>We then introduce a re-cognizing optimization strategy that systematically enhances the logical reasoning intensity of training data.
arXiv Detail & Related papers (2025-09-29T14:20:04Z) - Teaching LLMs to Think Mathematically: A Critical Study of Decision-Making via Optimization [1.246870021158888]
This paper investigates the capabilities of large language models (LLMs) in formulating and solving decision-making problems using mathematical programming.<n>We first conduct a systematic review and meta-analysis of recent literature to assess how well LLMs understand, structure, and solve optimization problems across domains.<n>Our systematic evidence is complemented by targeted experiments designed to evaluate the performance of state-of-the-art LLMs in automatically generating optimization models for problems in computer networks.
arXiv Detail & Related papers (2025-08-25T14:52:56Z) - Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study [55.09905978813599]
Large Language Models (LLMs) hold promise in automating data analysis tasks.<n>Yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios.<n>In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs.
arXiv Detail & Related papers (2025-06-24T17:04:23Z) - Aggregating empirical evidence from data strategy studies: a case on model quantization [5.467675229660525]
This study assesses the effects of model quantization on correctness and resource efficiency in deep learning (DL) systems.<n>We applied the Structured Synthesis Method (SSM) to aggregate the findings.
arXiv Detail & Related papers (2025-05-01T19:18:35Z) - Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers [0.0]
This paper presents a machine learning framework that automates dataset mention detection across research domains.<n>We employ zero-shot extraction from research papers, an LLM-as-a-Judge for quality assessment, and a reasoning agent for refinement to generate a weakly supervised synthetic dataset.<n>At inference, a ModernBERT-based classifier efficiently filters dataset mentions, reducing computational overhead while maintaining high recall.
arXiv Detail & Related papers (2025-02-14T16:16:02Z) - Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.<n>We introduce novel algorithms for dynamic, instance-level data reweighting.<n>Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Graph Neural Network-Driven Hierarchical Mining for Complex Imbalanced Data [0.8246494848934447]
This study presents a hierarchical mining framework for high-dimensional imbalanced data.<n>By constructing a structured graph representation of the dataset and integrating graph neural network embeddings, the proposed method effectively captures global interdependencies among samples.<n> Empirical evaluations across multiple experimental scenarios validate the efficacy of the proposed approach.
arXiv Detail & Related papers (2025-02-06T06:26:41Z) - Empowering Meta-Analysis: Leveraging Large Language Models for Scientific Synthesis [7.059964549363294]
This study investigates the automation of meta-analysis in scientific documents using large language models (LLMs)
Our research introduces a novel approach that fine-tunes the LLM on extensive scientific datasets to address challenges in big data handling and structured data extraction.
arXiv Detail & Related papers (2024-11-16T20:18:57Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.