Related papers: GIFT: Games as Informal Training for Generalizable LLMs

GIFT: Games as Informal Training for Generalizable LLMs

URL: http://arxiv.org/abs/2601.05633v1
Date: Fri, 09 Jan 2026 08:42:44 GMT
Title: GIFT: Games as Informal Training for Generalizable LLMs
Authors: Nuoyan Lyu, Bingbing Xu, Weihao Meng, Yige Yuan, Yang Zhang, Zhiyong Huang, Tat-Seng Chua, Huawei Shen,
Abstract summary: Large Language Models (LLMs) struggle with "practical wisdom" and generalizable intelligence.<n>This gap arises from a lack of informal learning, which thrives on interactive feedback rather than goal-oriented instruction.<n>We propose treating Games as a primary environment for LLM informal learning, leveraging their intrinsic reward signals and abstracted complexity.
Score: 64.47890325824763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Large Language Models (LLMs) have achieved remarkable success in formal learning tasks such as mathematics and code generation, they still struggle with the "practical wisdom" and generalizable intelligence, such as strategic creativity and social reasoning, that characterize human cognition. This gap arises from a lack of informal learning, which thrives on interactive feedback rather than goal-oriented instruction. In this paper, we propose treating Games as a primary environment for LLM informal learning, leveraging their intrinsic reward signals and abstracted complexity to cultivate diverse competencies. To address the performance degradation observed in multi-task learning, we introduce a Nested Training Framework. Unlike naive task mixing optimizing an implicit "OR" objective, our framework employs sequential task composition to enforce an explicit "AND" objective, compelling the model to master multiple abilities simultaneously to achieve maximal rewards. Using GRPO-based reinforcement learning across Matrix Games, TicTacToe, and Who's the Spy games, we demonstrate that integrating game-based informal learning not only prevents task interference but also significantly bolsters the model's generalization across broad ability-oriented benchmarks. The framework and implementation are publicly available.

Related papers

Unified Reinforcement and Imitation Learning for Vision-Language Models [84.84277196012907]
Vision-Language Models (VLMs) have achieved remarkable progress, yet their large scale often renders them impractical for resource-constrained environments.<n>This paper introduces Unified Reinforcement and Imitation Learning (RIL), a novel and efficient training algorithm designed to create powerful, lightweight VLMs.
arXiv Detail & Related papers (2025-10-22T07:12:14Z)
MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization [52.149337961205624]
We propose a framework that empowers both inter- and intra-task optimization for surgical triplet recognition.<n>For inter-task optimization, we introduce the Shared-Specific-Disentangled (S$2$D) learning scheme that decomposes representations into task-shared and task-specific components.<n>For intra-task optimization conflicts, we develop a Coordinated Gradient Learning (CGL) strategy, which dissects and rebalances the positive-negative ambiguities.
arXiv Detail & Related papers (2025-09-16T09:48:52Z)
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models [21.600469921661233]
Think in Games (TiG) is a novel framework that empowers large language models to develop procedural understanding through direct interaction with game environments.<n>We show TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands.
arXiv Detail & Related papers (2025-08-29T07:13:39Z)
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling [66.0871543682453]
We present Omni-Thinker, a unified reinforcement learning framework that scales large language models across diverse tasks.<n>Our scheduler orders tasks according to accuracy backward transfer (BWT), reducing forgetting and improving multi-task performance.
arXiv Detail & Related papers (2025-07-20T01:50:16Z)
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill [25.686589649523587]
Learning open-vocabulary physical skills for simulated agents presents a significant challenge in artificial intelligence.<n>We introduce GROVE, a generalized reward framework that enables open-vocabulary physical skill learning without manual engineering or task-specific demonstrations.<n>To bridge the domain gap between simulation and natural images, we develop Pose2CLIP, a lightweight mapper that efficiently projects agent poses directly into semantic feature space.
arXiv Detail & Related papers (2025-04-05T14:44:47Z)
LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System [54.71619734800526]
GenMentor is a multi-agent framework designed to deliver goal-oriented, personalized learning within ITS.<n>It maps learners' goals to required skills using a fine-tuned LLM trained on a custom goal-to-skill dataset.<n>GenMentor tailors learning content with an exploration-drafting-integration mechanism to align with individual learner needs.
arXiv Detail & Related papers (2025-01-27T03:29:44Z)
Benchmarking General-Purpose In-Context Learning [19.40952728849431]
In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly. In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential. We introduce two benchmarks specifically crafted to train and evaluate GPICL functionalities.
arXiv Detail & Related papers (2024-05-27T14:50:42Z)
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL) Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning. We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization. In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation. We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.