FuguReport

FuguReport

Browse the latest weekly themes first, then scan the most recent daily reports and archives.

Anchor Date: 2026-06-12
Weekly

2026-06-05 - 2026-06-11

Daily

Recent Daily Reports

50 reports
2026-06-12 Application / 3D Object Articulation / Animation and robotic simulation

Instruct-Particulate: Scaling Feed-Forward 3D Object Articulation with Kinematic Control

This paper introduces Instruct-Particulate, a feed-forward model for reconstructing articulated 3D objects from a static 3D mesh while conditioning on a target kinematic specification.

2026-06-12 Method / Skill Learning / Ground-truth-free skill evolution framework

SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing

SkillAudit is a framework for improving agent skills without using hidden tests, reference solutions, rewards, or other external ground-truth signals during optimization.

2026-06-12 Application / Autonomous Agent / Persistent digital colleagues

From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI

This survey explores the trajectory of Large Language Models (LLMs) from conversational chatbots to persistent autonomous digital colleagues.

2026-06-12 Method / 3D Reconstruction / Unified 3D reconstruction with segmentation

Pano3D: Unified 3D Reconstruction and Panoptic Segmentation

Pano3D is a unified framework that performs 3D reconstruction and 3D panoptic segmentation directly from unposed RGB image collections.

2026-06-12 Method / Mixture-of-Experts / Task routing modeling

A theoretical model for task routing in mixture-of-expert transformers

This paper develops a theoretical framework for task routing in mixture-of-experts (MoE) transformers using a discrete language model built from syntactic templates and finite key-value knowledge dictionaries.

2026-06-11 Evaluation / Benchmarking / Environment change modeling

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

This paper introduces EvoArena, a benchmark suite for evaluating LLM agents in persistently evolving environments rather than static snapshots.

2026-06-11 Method / Representation Learning / Learning visual-action tokenizers

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

RepWAM is a representation-centric world action model designed for robot manipulation.

2026-06-11 Method / Action Interface / Code-driven interface for spatial reasoning

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

This paper studies how the action interface of a tool-augmented agent affects open-ended spatial reasoning.

2026-06-11 Method / Model Fingerprinting / Robust fingerprinting for T2I models

Efficient, Robust, and Anti-Collusion Fingerprinting of Image Diffusion Models

This paper studies model fingerprinting for text-to-image diffusion models under a threat that prior work largely ignores: collusion attacks in which multiple users combine their fingerprinted model copies to weaken attribution.

2026-06-11 Evaluation / Security Assessment / Prompt injection benchmarking for web agents

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

This paper introduces StakeBench, a benchmark for evaluating prompt-injection attacks against real-world LLM web agents from a stakeholder-centric perspective.

2026-06-10 Method / Autonomous Agents / Coordinator-executor framework

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

This paper studies autonomous research as a long-horizon optimization problem and formalizes it as Autonomous Optimization (AO), where an agent must iteratively improve an artifact using development feedback while reserving held-out evaluation for admission decisions.

2026-06-10 Method / Generative Models / Unified audio generation framework

AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation

AudioX-Turbo is a unified framework for generating audio or music from flexible combinations of text, video, and audio conditions.

2026-06-10 Method / Multimodal Reasoning / Closed-loop contextual reasoning

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

InternVideo3 is a framework for long-horizon video understanding that reformulates multimodal understanding as Multimodal Contextual Reasoning (MCR), a closed-loop process over an evolving shared context.

2026-06-10 Method / 4D Perception / Motion and interaction tracking

4DP-QA: Scalable QA for 4D Perception in Vision Language Models

This paper presents a scalable question-answer generation pipeline for training and evaluating vision-language models on 4D scene understanding, with an emphasis on motion and dynamic spatial reasoning.

2026-06-10 Method / World Models / Object-centric composable transitions learning

Slots, Transitions, Loops: Learning Composable World Models for ARC

This paper studies ARC as demonstration-conditioned state transition learning rather than direct grid-to-grid prediction.

2026-06-09 Evaluation / Frontier Model / Control intervention awareness benchmarking

CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs

This paper introduces CIAware-Bench, a benchmark for measuring control intervention awareness in frontier language models: whether a model can tell when part of its trajectory has been replaced or edited by a control protocol.

2026-06-09 Method / Causal Discovery / Hierarchical statistical causal insights

Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting

This paper proposes Causal Ensemble Agent (CEA), a hierarchical causal discovery framework that combines multiple statistical causal discovery experts across three graph levels: skeletons, v-structures, and edge orientations.

2026-06-09 Method / Multimodal Models / Architecture combining video and agent intelligence

Kwai Keye-VL-2.0 Technical Report

Kwai Keye-VL-2.

2026-06-09 Method / Neuro-Symbolic Reasoning / ASP-based reasoning with neural networks

Accelerating NeurASP with vectorization and caching

This paper studies the computational bottlenecks of NeurASP, a neuro-symbolic framework that trains neural networks through ASP-based reasoning when only downstream labels are available.

2026-06-08 Method / Context Compression / End-to-end KV cache compression

End-to-End Context Compression at Scale

This paper revisits encoder-decoder context compression for long-context language model inference, targeting the memory bottleneck created by growing KV caches.

2026-06-08 Evaluation / Multimodal Benchmarking / Evaluating spatial reasoning in multimodal agents

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

SpatialWorld is a benchmark for evaluating interactive spatial reasoning in multimodal agents on complex real-world tasks.

2026-06-08 Method / 3D Segmentation / End-to-end feed-forward panoptic segmentation

EPS3D: End-to-End Feed-Forward 3D Panoptic Segmentation

EPS3D is an end-to-end feed-forward framework for open-vocabulary 3D panoptic segmentation from unposed multi-view images.

2026-06-08 Method / Image Generation / Autoregressive framework for multimodal input

OmniGen-AR: AutoRegressive Any-to-Image Generation

OmniGen-AR is presented as a unified autoregressive framework for any-to-image generation that encodes text and diverse visual conditions into discrete tokens within a single model.

2026-06-08 Method / World Modeling / Text world model framework

Bridging the Agent-World Gap: Text World Models for LLM-based Agents

This paper surveys text world models (TWMs) for LLM-based agents, starting from the observation that many current agents act reactively without an explicit model of how textual environments change over time.

2026-06-07 Method / Attention / Block-wise attention skipping technique

Look Less, Reason More: Block-wise Attention Skipping for Efficient Multimodal LLMs

This paper studies inference inefficiency in multimodal large language models and argues that deep-layer visual self-attention becomes redundant after visual tokens have already formed stable spatial structure.

2026-06-07 Theory / Model Theory / Parameter space analysis of transformers

Understanding the Parameter Space Geometry of Transformers Encoding Boolean Functions

This paper studies why transformers often fail to learn certain Boolean functions even when those functions are expressible by some parameter settings.

2026-06-07 Evaluation / Model Alignment Evaluation / Comprehensive emergent misalignment assessment

Activation Steering Induces Emergent Misalignment: A More Comprehensive Evaluation

This paper studies whether activation steering can induce emergent misalignment, meaning broadly unsafe behavior that generalizes beyond the narrow task used to derive the steering signal.

2026-06-07 Method / Distribution Modeling / Expanding generative model support

Active Flow Expansion for Out-of-Distribution Discovery: from Theory to Molecules

This paper argues that standard flow and diffusion pre-training is limited for scientific discovery because it matches the observed data distribution, which may cover only a small portion of the full valid design space.

2026-06-07 Method / Representation Learning / Token-subset alignment technique

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

This paper studies representation alignment for diffusion transformers from a token-level perspective and argues that aligning all diffusion tokens to clean-image encoder features creates a mismatch because diffusion inputs are noisy and informative content varies by timestep.

2026-06-06 Method / Model Decoding / Parallel decoding framework PoE-Bridge

Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge

This paper studies how to improve diffusion language model (DLM) decoding so it retains parallel generation speed while better matching the quality of a stronger autoregressive (AR) model.

2026-06-06 Method / World Models / Controllable video world model design

DisCo: World Models with Discrete Camera Motion Control

DisCo is a controllable video world model that replaces continuous camera trajectories with a compact discrete action space for camera motion control.

2026-06-06 Method / Learning Framework / General foothold tracking training

Mind Your Steps: A General Learning Framework for Accurate Humanoid Foothold Tracking

This paper presents a lightweight reinforcement-learning framework for training general-purpose 3D foothold-tracking policies for humanoid locomotion.

2026-06-06 Evaluation / Model Benchmarking / Personalized facial reaction generation challenge

REACT 2026: The Fourth Multiple Appropriate Facial Reaction Generation Challenge: Personalised MAFRG and Appropriate EEG Reaction Prediction

This paper presents the REACT 2026 challenge on multiple appropriate facial reaction generation (MAFRG) in dyadic interactions, extending prior editions with a stronger focus on personalisation.

2026-06-06 Method / Object Navigation / Spatial-visual navigation policy learning

IntentNav: Learning Spatial-Visual Object Navigation from Human Demonstrations

IntentNav is a framework for ObjectNav that learns human-like search policies from human demonstrations rather than relying on low-level action imitation alone.

2026-06-05 Task / Video Understanding / Human-perspective video analysis

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

This paper is a survey on multimodal large language model (MLLM) based video understanding, motivated by the shift from short clips to long, multimodal, and knowledge-intensive video scenarios.

2026-06-05 Method / Model Scaling / Test-time scaling framework

ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning

ThinkBooster is presented as a unified framework for test-time compute scaling in LLM reasoning, intended to support both research and practical deployment.

2026-06-05 Method / Representation Compression / Planning-aligned token compression framework

Planning-aligned Token Compression for Long-Context Autonomous Driving

This paper introduces COMPACT-VA, a planning-aligned token compression framework for long-context autonomous driving built on a conditional VQ-VAE and a hierarchical Q-former memory buffer.

2026-06-05 Method / Concept Extraction / Forensic concept localization

ForensicConcept: Transferable Forensic Concepts for AIGI Detection

This paper studies why AI-generated image detectors often generalize poorly to unseen generators and argues that one obstacle is the lack of explicit, inspectable evidence in current black-box detectors.

2026-06-05 Method / Embodied Simulation / Egocentric interaction framework

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

AnchorWorld is a framework for embodied egocentric world simulation that combines human-motion-driven control with localized world customization.

2026-06-04 Method / Biomedical Modeling / World models with intervention-conditioned dynamics

Towards World Models in Biomedical Research

This paper is a perspective article that proposes biomedical world models as a new AI paradigm for biomedical research.

2026-06-04 Method / Policy Regularization / KL-regularized reinforcement learning

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

This paper studies KL-regularized contextual bandits and episodic reinforcement learning with general function approximation when the model class is misspecified.

2026-06-04 Method / Formal Methods / Framework for theorem proving with blueprint refinement

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Goedel-Architect is an agentic Lean 4 theorem-proving pipeline built around a global blueprint: a dependency graph of formally stated definitions and lemmas leading to a target theorem.

2026-06-04 Method / Visual Reasoning / Interleaved visual-linguistic inference

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

This paper studies video event prediction, where a model must infer unobserved future events from a partial video.

2026-06-04 Evaluation / Usability Evaluation / Measuring speech translation usability

Ouvia: A User-centered Framework for Measuring Usability of Speech Translation in Real-World Communication Scenarios

This paper introduces Ouvia, a user-centered evaluation framework for assessing the usability of speech translation in realistic one-to-one communication settings rather than decontextualized benchmark tests.

2026-06-03 Method / Audio Interaction Model / Real-time perceptual-cognitive loop

Audio Interaction Model

This paper formalizes the Audio Interaction Model, a streaming audio-language setting in which a model continuously listens to audio and decides when to remain silent or respond.

2026-06-03 Evaluation / Agent Evaluation / Real-world long-term tasks

Agents' Last Exam

Agents’ Last Exam (ALE) is a benchmark for evaluating AI agents on long-horizon, economically valuable real-world tasks with verifiable outcomes.

2026-06-03 Method / 3D Representation / Token-based Gaussian splatting model

ZipSplat: Fewer Gaussians, Better Splats

ZipSplat is a feed-forward 3D Gaussian Splatting model that predicts a compact scene representation from multi-view images without tying one Gaussian to each input pixel.

2026-06-03 Evaluation / Benchmarking / Long-horizon task performance

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

AutoLab is a benchmark for ultra long-horizon closed-loop optimization, designed to evaluate whether frontier models can improve working but suboptimal research and engineering artifacts through repeated experimentation and refinement over multi-hour budgets.

2026-06-03 Method / Vision-Language Model / Stateful visual representation modeling

Stateful Visual Encoders for Vision-Language Models

This paper studies a limitation of open-weight vision-language models in multi-image and multi-turn settings: their visual encoders usually process each image independently, leaving cross-image comparison to the language model.

2026-06-03 Method / Model Selection / Validation procedure for deep UDA

Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation

This paper addresses model selection for deep unsupervised domain adaptation (UDA), where target labels are unavailable and commonly used validation strategies are biased, unstable, or rely on labeled target data.

Anchor Date: 2026-06-12
Archive

Archive

Weekly Archive

15

LLM研究エージェントの評価

This week's theme centers on evaluating and improving LLM-based research and problem-solving agents beyond one-shot task success.

2026-06-05 - 2026-06-11

構造化ワールドモデル

This week's papers advance world modeling away from monolithic black-box predictors toward structured, modular architectures designed to better capture the dynamics of diverse environments.

2026-06-05 - 2026-06-11

制御可能でスケーラブルなモデルマージング

This week's theme centers on making model merging more controllable, scalable, and robust as the number of fine-tuned expert models grows.

2026-06-05 - 2026-06-11

身体化ワールドモデルと評価

This week's work marks a shift from evaluating multimodal models on static perception toward testing whether they can form actionable, physically grounded world models.

2026-05-29 - 2026-06-04

AIガバナンスと安全性

This week's AI safety research emphasizes the shift from broad concern about AI harms toward structured governance and quantitative risk-modeling frameworks.

2026-05-29 - 2026-06-04

LLMのエージェント型推論評価

This theme centers on evaluating and structuring LLM reasoning in settings where static prompting or generic inference heuristics break down—especially when retrieval, domain knowledge, and multi-step decision rules must interact.

2026-05-29 - 2026-06-04

推薦システムへの強化学習の適用

This week's theme centers on applying reinforcement learning to move recommendation beyond greedy next-item prediction toward long-term user engagement.

2026-05-22 - 2026-05-28

整合的視覚表現

This week's papers treat representation quality and cross-scale alignment as a central bottleneck in both generative modeling and general visual pretraining.

2026-05-22 - 2026-05-28

視覚言語ナビゲーションにおける空間推論と不確実性

This week's theme centers on how vision-language and embodied models are being tested and redesigned for navigation when spatial reasoning, long-horizon decision-making, and safety become bottlenecks.

2026-05-22 - 2026-05-28

LLM共同研究者の評価

This week's theme centers on how LLM-based research agents should be assessed and scaffolded as they move beyond writing support into research planning, experimentation, review, and publication workflows.

2026-05-15 - 2026-05-21

身体性VLMのための構造的表現

This week's theme centers on equipping vision-language models with explicit geometric and navigational structure for embodied tasks, moving beyond brittle prompting or task-specific heads.

2026-05-15 - 2026-05-21

構造化された効率的な拡散モデル編集

This theme centers on diffusion models that move beyond generic text-to-image generation toward more structured, grounded, and computationally practical image editing and perception.

2026-05-15 - 2026-05-21

統合的自己回帰画像生成・編集

This week saw continued progress toward unified models that combine image generation, editing, and understanding within single autoregressive or hybrid autoregressive-diffusion architectures.

2026-05-08 - 2026-05-14

LLMマルチエージェント協調

This theme centers on coordinating multiple LLM-based agents to handle tasks beyond what a single model instance can easily support.

2026-05-08 - 2026-05-14

生成的3D再構成と映像理解

This week's theme centers on methods that recover richer scene structure and semantics from limited video observations.

2026-05-08 - 2026-05-14
This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.