Related papers: Aviary: training language agents on challenging scientific tasks

Aviary: training language agents on challenging scientific tasks

URL: http://arxiv.org/abs/2412.21154v1
Date: Mon, 30 Dec 2024 18:33:28 GMT
Title: Aviary: training language agents on challenging scientific tasks
Authors: Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, Andrew D. White,
Abstract summary: We introduce Aviary, a language-grounded Markov gymnasium for language agents.<n>We formalize agents as policies solving language-grounded partially observable decision processes.<n>We show that language agents backed by open-source, non-frontier LLMs can match and exceed both frontier LLM agents and human experts on multiple tasks at up to 100x lower inference cost.
Score: 3.166958237958637
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Solving complex real-world tasks requires cycles of actions and observations. This is particularly true in science, where tasks require many cycles of analysis, tool use, and experimentation. Language agents are promising for automating intellectual tasks in science because they can interact with tools via natural language or code. Yet their flexibility creates conceptual and practical challenges for software implementations, since agents may comprise non-standard components such as internal reasoning, planning, tool usage, as well as the inherent stochasticity of temperature-sampled language models. Here, we introduce Aviary, an extensible gymnasium for language agents. We formalize agents as policies solving language-grounded partially observable Markov decision processes, which we term language decision processes. We then implement five environments, including three challenging scientific environments: (1) manipulating DNA constructs for molecular cloning, (2) answering research questions by accessing scientific literature, and (3) engineering protein stability. These environments were selected for their focus on multi-step reasoning and their relevance to contemporary biology research. Finally, with online training and scaling inference-time compute, we show that language agents backed by open-source, non-frontier LLMs can match and exceed both frontier LLM agents and human experts on multiple tasks at up to 100x lower inference cost.

Related papers

Distilling Tool Knowledge into Language Models via Back-Translated Traces [12.670632885715305]
We propose a new paradigm for distilling tool knowledge into large language models (LLMs) purely through natural language.<n>A Translator Agent generates explanations for individual tool calls, while a Rephrase Agent merges them into a fluent and globally coherent narrative.<n>We show that fine-tuning a small open-source model on these synthesized traces enables it to internalize both tool knowledge and structured reasoning patterns.
arXiv Detail & Related papers (2025-06-23T22:10:38Z)
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows [82.07367406991678]
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing.<n>Among these, computer-using agents are capable of interacting with operating systems as humans do.<n>We introduce ScienceBoard, which encompasses a realistic, multi-domain environment featuring dynamic and visually rich scientific software.
arXiv Detail & Related papers (2025-05-26T12:27:27Z)
Symbolic Learning Enables Self-Evolving Agents [55.625275970720374]
We introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks.
arXiv Detail & Related papers (2024-06-26T17:59:18Z)
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning [67.26776442697184]
We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space. Husky iterates between two stages: 1) generating the next action to take towards solving a given task and 2) executing the action using expert models. Our experiments show that Husky outperforms prior language agents across 14 evaluation datasets.
arXiv Detail & Related papers (2024-06-10T17:07:25Z)
Language Evolution with Deep Learning [49.879239655532324]
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models.
arXiv Detail & Related papers (2024-03-18T16:52:54Z)
Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z)
Cognitive Architectures for Language Agents [44.89258267600489]
We propose Cognitive Architectures for Language Agents (CoALA) CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents.
arXiv Detail & Related papers (2023-09-05T17:56:20Z)
Collecting Interactive Multi-modal Datasets for Grounded Language Understanding [66.30648042100123]
We formalized the collaborative embodied agent using natural language task. We developed a tool for extensive and scalable data collection. We collected the first dataset for interactive grounded language understanding.
arXiv Detail & Related papers (2022-11-12T02:36:32Z)
Inner Monologue: Embodied Reasoning through Planning with Language Models [81.07216635735571]
Large Language Models (LLMs) can be applied to domains beyond natural language processing. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios.
arXiv Detail & Related papers (2022-07-12T15:20:48Z)
CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks [30.936692970187416]
General-purpose robots must learn to relate human language to their perceptions and actions. We present CALVIN, an open-source simulated benchmark to learn long-horizon language-conditioned tasks.
arXiv Detail & Related papers (2021-12-06T18:37:33Z)
A Practical Guide to Studying Emergent Communication through Grounded Language Games [0.0]
This paper introduces a high-level robot interface that extends the Babel software system. It presents for the first time a toolkit that provides flexible modules for dealing with each subtask involved in running advanced grounded language game experiments.
arXiv Detail & Related papers (2020-04-20T11:48:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.