Related papers: ScienceWorld: Is your Agent Smarter than a 5th Grader?

ScienceWorld: Is your Agent Smarter than a 5th Grader?

URL: http://arxiv.org/abs/2203.07540v1
Date: Mon, 14 Mar 2022 22:52:34 GMT
Title: ScienceWorld: Is your Agent Smarter than a 5th Grader?
Authors: Ruoyao Wang, Peter Jansen, Marc-Alexandre C\^ot\'e, Prithviraj Ammanabrolu
Abstract summary: This paper presents a new benchmark, ScienceWorld, to test agents' scientific reasoning abilities. Current state-of-the-art models are unable to reason about or explain learned science concepts in novel contexts.
Score: 12.066880938687154
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper presents a new benchmark, ScienceWorld, to test agents' scientific reasoning abilities in a new interactive text environment at the level of a standard elementary school science curriculum. Despite the recent transformer-based progress seen in adjacent fields such as question-answering, scientific text processing, and the wider area of natural language processing, we find that current state-of-the-art models are unable to reason about or explain learned science concepts in novel contexts. For instance, models can easily answer what the conductivity of a previously seen material is but struggle when asked how they would conduct an experiment in a grounded, interactive environment to find the conductivity of an unknown material. This begs the question of whether current models are simply retrieving answers by way of seeing a large number of similar input examples or if they have learned to reason about concepts in a reusable manner. We hypothesize that agents need to be grounded in interactive environments to achieve such reasoning capabilities. Our experiments provide empirical evidence supporting this hypothesis -- showing that a 1.5 million parameter agent trained interactively for 100k steps outperforms a 11 billion parameter model statically trained for scientific question-answering and reasoning via millions of expert demonstrations.

Related papers

Behavioral Exploration: Learning to Explore via In-Context Adaptation [53.92981562916783]
We train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how exploratory'' the expert's behaviors are relative to this context.<n>This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected.<n>We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks.
arXiv Detail & Related papers (2025-07-11T21:36:19Z)
Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system. Scientists typically collect low-level measurements, such as geographically distributed temperature readings. We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z)
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions. We show that recent representations are already partially resilient to perturbations of non-causal agents. We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z)
Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation [7.647395374489533]
We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions. We show that our approach generates explanations as helpful as those produced by a human domain expert.
arXiv Detail & Related papers (2023-11-29T20:16:23Z)
Towards Understanding How Machines Can Learn Causal Overhypotheses [4.540122114051773]
Children are adept at many kinds of causal inference and learning. One of the key challenges for current machine learning algorithms is modeling and understanding causal overhypotheses. We present a new benchmark -- a flexible environment which allows for the evaluation of existing techniques.
arXiv Detail & Related papers (2022-06-16T17:54:16Z)
Observing Interventions: A logic for thinking about experiments [62.997667081978825]
This paper makes a first step towards a logic of learning from experiments. Crucial for our approach is the idea that the notion of an intervention can be used as a formal expression of a (real or hypothetical) experiment. For all the proposed logical systems, we provide a sound and complete axiomatization.
arXiv Detail & Related papers (2021-11-25T09:26:45Z)
OPEn: An Open-ended Physics Environment for Learning Without a Task [132.6062618135179]
We will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks. We build a benchmark Open-ended Physics ENvironment (OPEn) and also design several tasks to test learning representations in this environment explicitly. We find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
arXiv Detail & Related papers (2021-10-13T17:48:23Z)
CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning [68.74447489372037]
We present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning. A core component of our work is to introduce textitagency, such that it is simple to define and create complex scenarios. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment.
arXiv Detail & Related papers (2021-06-25T00:21:41Z)
PROST: Physical Reasoning of Objects through Space and Time [68.69796589964076]
This dataset contains 18,736 multiple-choice questions made from 14 manually curated templates. We conduct an analysis which demonstrates that state-of-the-art pretrained models are inadequate at physical reasoning.
arXiv Detail & Related papers (2021-06-07T14:06:20Z)
Learning Transferable Push Manipulation Skills in Novel Contexts [3.1981440103815717]
We learn a parametric internal model for push interactions that enables a robot to predict the outcome of a physical interaction even in novel contexts. We train on 2 objects for a total of 24,000 pushes in various conditions, and test on 6 objects for a total of 14,400 predicted push outcomes. Our results show that both biased and unbiased predictors can reliably produce predictions in line with the outcomes of a carefully tuned physics simulator.
arXiv Detail & Related papers (2020-07-29T11:48:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.