Related papers: RD2Bench: Toward Data-Centric Automatic R&D

RD2Bench: Toward Data-Centric Automatic R&D

URL: http://arxiv.org/abs/2404.11276v1
Date: Wed, 17 Apr 2024 11:33:21 GMT
Title: RD2Bench: Toward Data-Centric Automatic R&D
Authors: Haotian Chen, Xinjie Shen, Zeqi Ye, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian,
Abstract summary: Researchers often seek the potential research directions by reading and then verifying them through experiments. The data-driven black-box deep learning method demonstrates its effectiveness in a wide range of real-world scenarios. We propose a Real-world Data-centric automatic R&D Benchmark, namely RD2Bench.
Score: 18.570307541212053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method demonstrates its effectiveness in a wide range of real-world scenarios, which exacerbates the experimental burden of researchers and thus renders the potential successful discoveries veiled. Therefore, automating such a research and development (R&D) process is an urgent need. In this paper, we serve as the first effort to formalize the goal by proposing a Real-world Data-centric automatic R&D Benchmark, namely RD2Bench. RD2Bench benchmarks all the operations in data-centric automatic R&D (D-CARD) as a whole to navigate future work toward our goal directly. We focuses on evaluating the interaction and synergistic effects of various model capabilities and aiding to select the well-performed trustworthy models. Although RD2Bench is very challenging to the state-of-the-art (SOTA) large language model (LLM) named GPT-4, indicating ample research opportunities and more research efforts, LLMs possess promising potential to bring more significant development to D-CARD: They are able to implement some simple methods without adopting any additional techniques. We appeal to future work to take developing techniques for tackling automatic R&D into consideration, thus bringing the opportunities of the potential revolutionary upgrade to human productivity.

Related papers

Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science [25.857554476782827]
This paper explores how augmenting large language models with relevant data during the idea generation process can enhance the quality of generated ideas.<n>We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%.<n>A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality.
arXiv Detail & Related papers (2025-05-27T16:23:42Z)
R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution [60.80016554091364]
R&D-Agent is a dual-agent framework for iterative exploration.<n>The Researcher agent uses performance feedback to generate ideas, while the Developer agent refines code based on error feedback.<n>R&D-Agent is evaluated on MLE-Bench and emerges as the top-performing machine learning engineering agent.
arXiv Detail & Related papers (2025-05-20T06:07:00Z)
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models [58.98176123850354]
The recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. The implementation details of the released models have not been fully open-sourced by DeepSeek, including DeepSeek-R1-Zero, DeepSeek-R1, and the distilled small models. Many replication studies have emerged aiming to reproduce the strong performance achieved by DeepSeek-R1, reaching comparable performance through similar training procedures and fully open-source data resources.
arXiv Detail & Related papers (2025-05-01T14:28:35Z)
MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z)
Deploying Large Language Models With Retrieval Augmented Generation [0.21485350418225244]
Retrieval Augmented Generation has emerged as a key approach for integrating knowledge from data sources outside of the large language model's training set. We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval.
arXiv Detail & Related papers (2024-11-07T22:11:51Z)
CycleResearcher: Improving Automated Research via Automated Review [37.03497673861402]
This paper explores the possibility of using open-source post-trained large language models (LLMs) as autonomous agents capable of performing the full cycle of automated research and review. To train these models, we develop two new datasets, reflecting real-world machine learning research and peer review dynamics. In research, the papers generated by the CycleResearcher model achieved a score of 5.36 in simulated peer reviews, surpassing the preprint level of 5.24 from human experts and approaching the accepted paper level of 5.69.
arXiv Detail & Related papers (2024-10-28T08:10:21Z)
O1 Replication Journey: A Strategic Progress Report -- Part 1 [52.062216849476776]
This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey. Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects. We propose the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process.
arXiv Detail & Related papers (2024-10-08T15:13:01Z)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z)
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques. Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z)
Autonomous LLM-driven research from data to human-verifiable research papers [0.0]
We build an automation platform that guides interacting through complete stepwise process. In mode provided annotated data alone, datapaper raised hypotheses, designed plans, wrote and interpreted analysis codes, generated and interpreted results. We demonstrate potential for AI-driven acceleration of scientific discovery while enhancing traceability, transparency and verifiability.
arXiv Detail & Related papers (2024-04-24T23:15:49Z)
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment. We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training [19.987304448524043]
We propose a multi-task self-supervised learning-based researcher data pre-training model named RPT. We divide the researchers' data into semantic document sets and community graph. We propose three self-supervised learning objectives to train the whole model.
arXiv Detail & Related papers (2021-10-08T03:42:09Z)
A Survey of Knowledge Tracing: Models, Variants, and Applications [70.69281873057619]
Knowledge Tracing is one of the fundamental tasks for student behavioral data analysis. We present three types of fundamental KT models with distinct technical routes. We discuss potential directions for future research in this rapidly growing field.
arXiv Detail & Related papers (2021-05-06T13:05:55Z)
ACDER: Augmented Curiosity-Driven Experience Replay [16.755555854030412]
We propose a novel method called Augmented Curiosity-Driven Experience Replay (ACDER) ACDER uses a new goal-oriented curiosity-driven exploration to encourage the agent to pursue novel and task-relevant states more purposefully. Experiments conducted on four challenging robotic manipulation tasks with binary rewards, including Reach, Push, Pick&Place and Multi-step Push.
arXiv Detail & Related papers (2020-11-16T15:27:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.