AmadeusGPT: a natural language interface for interactive animal
behavioral analysis
- URL: http://arxiv.org/abs/2307.04858v1
- Date: Mon, 10 Jul 2023 19:15:17 GMT
- Title: AmadeusGPT: a natural language interface for interactive animal
behavioral analysis
- Authors: Shaokai Ye, Jessy Lauer, Mu Zhou, Alexander Mathis, Mackenzie W.
Mathis
- Abstract summary: We introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code.
We show we can produce state-of-the-art performance on the MABE 2022 behavior challenge tasks.
AmadeusGPT presents a novel way to merge deep biological knowledge, large-language models, and core computer vision modules into a more naturally intelligent system.
- Score: 65.55906175884748
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The process of quantifying and analyzing animal behavior involves translating
the naturally occurring descriptive language of their actions into
machine-readable code. Yet, codifying behavior analysis is often challenging
without deep understanding of animal behavior and technical machine learning
knowledge. To limit this gap, we introduce AmadeusGPT: a natural language
interface that turns natural language descriptions of behaviors into
machine-executable code. Large-language models (LLMs) such as GPT3.5 and GPT4
allow for interactive language-based queries that are potentially well suited
for making interactive behavior analysis. However, the comprehension capability
of these LLMs is limited by the context window size, which prevents it from
remembering distant conversations. To overcome the context window limitation,
we implement a novel dual-memory mechanism to allow communication between
short-term and long-term memory using symbols as context pointers for retrieval
and saving. Concretely, users directly use language-based definitions of
behavior and our augmented GPT develops code based on the core AmadeusGPT API,
which contains machine learning, computer vision, spatio-temporal reasoning,
and visualization modules. Users then can interactively refine results, and
seamlessly add new behavioral modules as needed. We benchmark AmadeusGPT and
show we can produce state-of-the-art performance on the MABE 2022 behavior
challenge tasks. Note, an end-user would not need to write any code to achieve
this. Thus, collectively AmadeusGPT presents a novel way to merge deep
biological knowledge, large-language models, and core computer vision modules
into a more naturally intelligent system. Code and demos can be found at:
https://github.com/AdaptiveMotorControlLab/AmadeusGPT.
Related papers
- Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs [14.997971970162743]
Humans spontaneously use increasingly efficient language as interactions progress, by adapting and forming ad-hoc conventions.
It remains unexplored whether multimodal large language models (MLLMs) similarly increase communication efficiency during interactions.
We introduce ICCA, an automated framework to evaluate such conversational adaptation as an in-context behavior in MLLMs.
arXiv Detail & Related papers (2024-08-02T17:51:57Z) - ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding [67.63933036920012]
Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location.
This study presents ClawMachine, offering a new methodology that notates an entity directly using the visual tokens.
ClawMachine unifies visual referring and grounding into an auto-regressive format and learns with a decoder-only architecture.
arXiv Detail & Related papers (2024-06-17T08:39:16Z) - Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines [0.0]
We introduce a dynamic pipeline that transforms natural language task descriptions into code through high-level data-shaping instructions.
This paper details the fine-tuning process, and sheds light on how natural language descriptions can be translated into functional code.
We propose an algorithm capable of transforming a natural description of an ML task into code with minimal human interaction.
arXiv Detail & Related papers (2024-03-18T08:58:47Z) - Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs [5.06113628525842]
We present a framework that can serve as an intermediary between a user and their user interface (UI)
We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations.
Our engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions.
arXiv Detail & Related papers (2024-02-07T21:08:49Z) - Dialogue-based generation of self-driving simulation scenarios using
Large Language Models [14.86435467709869]
Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars.
Current simulation frameworks are driven by highly-specialist domain specific languages.
There is often a gap between a concise English utterance and the executable code that captures the user's intent.
arXiv Detail & Related papers (2023-10-26T13:07:01Z) - Open-Ended Instructable Embodied Agents with Memory-Augmented Large
Language Models [19.594361652336996]
We introduce HELPER, an embodied agent equipped with an external memory of language-program pairs.
relevant memories are retrieved based on the current dialogue, instruction, correction, or VLM description.
HELPER sets a new state-of-the-art in the TEACh benchmark in both Execution from Dialog History (EDH) and Trajectory from Dialogue (TfD)
arXiv Detail & Related papers (2023-10-23T17:31:55Z) - Eliciting Human Preferences with Language Models [56.68637202313052]
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts.
We propose to use *LMs themselves* to guide the task specification process.
We study GATE in three domains: email validation, content recommendation, and moral reasoning.
arXiv Detail & Related papers (2023-10-17T21:11:21Z) - ChatABL: Abductive Learning via Natural Language Interaction with
ChatGPT [72.83383437501577]
Large language models (LLMs) have recently demonstrated significant potential in mathematical abilities.
LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities.
This paper presents a novel method for integrating LLMs into the abductive learning framework.
arXiv Detail & Related papers (2023-04-21T16:23:47Z) - Learning Adaptive Language Interfaces through Decomposition [89.21937539950966]
We introduce a neural semantic parsing system that learns new high-level abstractions through decomposition.
Users interactively teach the system by breaking down high-level utterances describing novel behavior into low-level steps.
arXiv Detail & Related papers (2020-10-11T08:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.