FuguReport

FuguReport

Browse the latest weekly themes first, then scan the most recent daily reports and archives.

Anchor Date: 2026-04-09
Weekly

2026-03-27 - 2026-04-02

Daily

Recent Daily Reports

15 reports
2026-04-09 Method / Reinforcement Learning / Nonlinear distribution matching training

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

OpenVLThinkerV2 is a multimodal reasoning model built on Qwen3-VL-Instruct-8B and trained with a novel reinforcement learning objective called Gaussian GRPO (G²RPO).

2026-04-09 Evaluation / User Simulation Benchmark / Real-world behavior simulation evaluation

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

This paper introduces OmniBehavior, a user simulation benchmark constructed from real-world Kuaishou platform logs rather than synthetic or isolated-scenario data.

2026-04-09 Evaluation / Mobile Agent Evaluation / Online benchmark for mobile agents

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

KnowU-Bench is an online benchmark for evaluating mobile agents on personalization, interaction, and proactive assistance beyond explicit instruction following.

2026-04-09 Method / Video Compression / Efficient compression for long videos

Small Vision-Language Models are Smart Compressors for Long Video Understanding

This paper introduces Tempo, a 6B-parameter query-aware framework that compresses long videos for downstream reasoning by multimodal large language models (MLLMs).

2026-04-09 Method / Agent Design / IntentFlow streaming model

PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

This paper proposes DD-MM-PAS (Demand Detection, Memory Modeling, Proactive Agent System), a paradigm for streaming proactive AI agents that infer latent user needs from ongoing context rather than waiting for explicit queries.

2026-04-08 Task / Sentiment Analysis / Dimensional aspect-based sentiment task

SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA)

This paper presents the SemEval-2026 shared task on Dimensional Aspect-Based Sentiment Analysis (DimABSA), which replaces categorical polarity labels in aspect-based sentiment analysis with continuous valence-arousal (VA) scores.

2026-04-08 Method / Fine-Tuning / Instruction-based tuning for AR models

MARS: Enabling Autoregressive Models Multi-Token Generation

This paper introduces MARS (Mask AutoRegression), a lightweight fine-tuning method that enables instruction-tuned autoregressive language models to predict multiple tokens per forward pass while preserving standard left-to-right autoregressive behavior.

2026-04-08 Method / Test-Time Training / Elastic weight regularization technique

Fast Spatial Memory with Elastic Test-Time Training

This paper identifies that Large Chunk Test-Time Training (LaCT) for long-context 3D/4D reconstruction suffers from catastrophic forgetting and overfitting due to fully plastic fast-weight updates, and is typically limited to a single large chunk spanning the full input sequence.

2026-04-08 Application / Robotic Grasping / Coordinated bimanual grasps

BiDexGrasp: Coordinated Bimanual Dexterous Grasps across Object Geometries and Sizes

BiDexGrasp presents a large-scale bimanual dexterous grasp dataset and a learning-based generation framework for coordinated two-hand grasping of objects with diverse geometries and sizes.

2026-04-08 Method / Adaptive Perception / Query-aware resolution adjustment

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

Q-Zoom is a query-aware adaptive perception framework for multimodal large language models (MLLMs) that reduces the cost of high-resolution visual processing by routing queries through a lightweight dynamic gating network.

2026-04-07 Method / 3D Reconstruction / Reconstructing functional indoor scenes

FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos

FunREC is a training-free, optimization-based method that reconstructs functional 3D digital twins of indoor scenes from a single egocentric RGB-D interaction video.

2026-04-07 Method / Research Discovery / Discovery and analysis framework

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Paper Circle is an open-source multi-agent framework for scientific literature discovery and analysis, built on two complementary pipelines.

2026-04-07 Evaluation / Benchmarking / LLM economic and trade performance benchmarking

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

Market-Bench is a benchmark for evaluating large language models in a competitive supply-chain economy where agents must handle both quantitative decisions and marketing language.

2026-04-07 Method / Policy Learning / Integrated world action model

Action Images: End-to-End Policy Learning via Multiview Video Generation

This paper introduces Action Images, a unified world-action model that formulates robot policy learning as multiview video generation.

2026-04-07 Method / Visuomotor Control / Policy learning with referential input

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

This paper introduces ReV (Referring-Aware Visuomotor Policy), a closed-loop imitation learning framework for robotic manipulation that incorporates sparse 3D referring points provided by a human or a high-level planner during execution.

Archive

Archive

Weekly Archive

9

医療AIの評価と時間的マルチモダリティ

This week's representative papers highlight that medical AI progress hinges on clearer evaluation frameworks and richer clinical context, not only on stronger models.

2026-03-27 - 2026-04-02

LLMマルチエージェントフレームワーク

This week's papers center on how to organize LLM-based multi-agent systems for complex, real-world tasks.

2026-03-27 - 2026-04-02

LLMの帰属と引用評価

This theme centers on how LLM outputs can be attributed to supporting documents so that generated answers are more transparent, verifiable, and trustworthy.

2026-03-27 - 2026-04-02

効率的マルチモーダル基盤モデル

This week's papers focus on making multimodal foundation models more efficient without sacrificing broad utility.

2026-03-19 - 2026-03-26

非定型・ドメインシフト音声に対する音声モデル適応

This week's theme concerns adapting and evaluating speech models when labeled in-domain data are scarce, domains shift, or speech departs from typical patterns.

2026-03-19 - 2026-03-26

AIの持続可能性と信頼性

This week's papers frame AI deployment as an environmental and governance challenge.

2026-03-19 - 2026-03-26

包括的なLLMエージェント評価

This week's evaluation work pushes beyond narrow benchmark settings toward broader tests for LLM- and VLM-based agents.

2026-03-16 - 2026-03-22

連合学習におけるプライバシー推論

This week's theme centers on privacy evaluation in federated learning, where shared gradients, parameters, or predictions can leak sensitive information even when raw data stays on-device.

2026-03-16 - 2026-03-22

AIの持続可能性と信頼性

This week's papers treat the environmental impact of AI infrastructure as a direct evaluation concern.

2026-03-16 - 2026-03-22

Daily Archive

38

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

OpenWorldLib is a standardized inference framework and codebase for advanced world models, motivated by the absence of a widely accepted definition of what constitutes a world model.

2026-04-06

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

This paper presents the first real-world safety evaluation of OpenClaw, a widely deployed personal AI agent with full local system access and integrations to services such as Gmail, Stripe, and the filesystem.

2026-04-06

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

FileGram is a unified framework for personalizing file-system agents using behavioral traces (action sequences and content deltas) rather than dialogue history alone.

2026-04-06

Structured Causal Video Reasoning via Multi-Objective Alignment

This paper proposes a structure-first framework for video reasoning in which a model first produces Structured Event Facts—compact, time-ordered descriptions of salient events and their causal relations—and then reasons under those constraints.

2026-04-06

Paper Espresso: From Paper Overload to Research Insight

Paper Espresso is an open-source platform that continuously discovers, summarizes, and analyzes community-trending arXiv papers sourced from the Hugging Face Daily Papers feed (approximately 2–3% of arXiv).

2026-04-06

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

This paper extends the Decision Pre-Trained Transformer (DPT) framework to cross-domain in-context reinforcement learning in continuous-control settings by integrating a flow-based action head trained via rectified flow matching.

2026-04-06

NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

This paper presents the results of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, which evaluates robust 3D reconstruction pipelines under real-world adverse conditions using the RealX3D benchmark.

2026-04-05

Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization

This paper provides a systematic stability and generalization analysis for first-order stochastic bilevel optimization (SBO) methods.

2026-04-05

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Combee is a framework for scaling prompt learning in self-improving language model agents under high parallelism.

2026-04-05

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

This paper proposes EGInterpolator, a two-stage framework for molecular dynamics (MD) trajectory generation.

2026-04-05

Relay-Assisted Activation-Integrated SIM for Wireless Physical Neural Networks

This paper proposes a relay-assisted wireless physical neural network (WPNN) architecture based on activation-integrated stacked intelligent metasurfaces (AI-SIMs).

2026-04-05

Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning

This paper studies prompt retrieval for visual in-context learning (VICL) and argues that existing methods overemphasize visual similarity while neglecting prompt labels.

2026-04-04

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval

This paper addresses Partially Relevant Video Retrieval (PRVR), where a text query describes only a segment of an untrimmed video, making retrieval susceptible to spurious local matches.

2026-04-04

Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

This paper analyzes expert routing patterns in multilingual Mixture-of-Experts (MoE) models and identifies a phenomenon termed Language Routing Isolation, where high-resource and low-resource languages activate largely disjoint expert sets.

2026-04-04

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

This paper introduces ActivityForensics, a benchmark for temporal localization of manipulated human activities in videos, targeting semantic changes in human actions rather than appearance-level edits such as face swaps or object removal.

2026-04-04

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

This paper presents SecPI, a fine-tuning pipeline for reasoning language models (RLMs) that aims to make secure code generation the default behavior without requiring explicit security prompts at inference time.

2026-04-04

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

CoME-VL is a modular multi-encoder vision-language framework that integrates a contrastively trained SigLIP2 encoder with a self-supervised DINOv3 encoder to improve both semantic understanding and spatial grounding.

2026-04-03

PolyReal: A Benchmark for Real-World Polymer Science Workflows

PolyReal is a multimodal benchmark designed to evaluate large multimodal models (MLLMs) on real-world polymer science workflows rather than isolated scientific subtasks.

2026-04-03

Do Audio-Visual Large Language Models Really See and Hear?

This paper presents the first mechanistic interpretability study of Audio-Visual Large Language Models (AVLLMs), analyzing how audio and visual representations evolve and fuse across transformer layers during caption generation.

2026-04-03

EvaNet: Towards More Efficient and Consistent Infrared and Visible Image Fusion Assessment

This paper addresses the efficiency and consistency shortcomings of existing evaluation metrics for infrared-visible image fusion (IVIF), which are largely borrowed from other vision tasks without adaptation.

2026-04-03

Verbalizing LLMs' assumptions to explain and control sycophancy

This paper introduces Verbalized Assumptions, a framework for eliciting LLMs' inferred assumptions about users through both open-ended and structured prompting, and connects these assumptions to social sycophancy.

2026-04-03

NearID: Identity Representation Learning via Near-identity Distractors

This paper identifies a systematic failure mode in vision encoders used for identity-focused tasks: embeddings entangle object identity with background context, allowing visually similar but distinct objects placed on the same background to outscore true cross-view matches.

2026-04-02

A3R: Agentic Affordance Reasoning via Cross-Dimensional Evidence in 3D Gaussian Scenes

This paper addresses affordance reasoning in 3D Gaussian Splatting (3DGS) scenes, where the goal is to localize the region supporting a text-specified action.

2026-04-02

Are VLMs Lost Between Sky and Space? LinkS$^2$Bench for UAV-Satellite Dynamic Cross-View Spatial Intelligence

This paper introduces LinkS2Bench, a benchmark for evaluating vision-language models (VLMs) on dynamic UAV-satellite cross-view spatial intelligence.

2026-04-02

Steerable Visual Representations

This paper introduces SteerViT, a method that makes pretrained vision transformer (ViT) representations steerable via natural language by inserting lightweight gated cross-attention layers into frozen ViT blocks, enabling text to influence intermediate visual features through early fusion.

2026-04-02

CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

CORAL is a framework for autonomous multi-agent evolution on open-ended discovery tasks, replacing fixed evolutionary heuristics with long-running agents that decide what to retrieve, test, evaluate, and store.

2026-04-02

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

VideoZeroBench is a hierarchical benchmark for long-video question answering that evaluates not only answer correctness but also whether models identify the correct temporal and spatial evidence.

2026-04-02

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

This paper introduces the Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive assistants through interaction with active user simulators in digital environments.

2026-04-01

Do Phone-Use Agents Respect Your Privacy?

This paper investigates whether phone-use agents handle user data appropriately while completing benign mobile tasks.

2026-04-01

Diff3R: Feed-forward 3D Gaussian Splatting with Uncertainty-aware Differentiable Optimization

Diff3R is a framework for feed-forward 3D Gaussian Splatting (3DGS) that trains models to produce initializations explicitly optimized for subsequent test-time refinement, rather than solely for zero-shot prediction.

2026-04-01

OrgAgent: Organize Your Multi-Agent System like a Company

This paper introduces OrgAgent, a company-style hierarchical multi-agent system that separates collaboration into governance, execution, and compliance layers.

2026-04-01

Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap

This paper addresses causal treatment effect estimation under weak overlap between treated and control covariate distributions, a setting where standard estimators become unstable, particularly in high dimensions.

2026-04-01

PET-DINO: Unifying Visual Cues into Grounding DINO with Prompt-Enriched Training

PET-DINO extends the text-prompted Grounding DINO detector to support both text and visual prompts for open-set object detection.

2026-04-01

Square Superpixel Generation and Representation Learning via Granular Ball Computing

This paper proposes a square superpixel generation method inspired by granular-ball computing, designed to produce grid-aligned, multi-scale square regions that are more compatible with modern deep learning pipelines than irregular superpixels.

2026-03-31

DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA

DIAL (Decoupling Intent and Action via Latent World Modeling) is an end-to-end vision-language-action framework that separates high-level intent formation from low-level motor execution through a differentiable latent intent bottleneck.

2026-03-31

Cold-Starts in Generative Recommendation: A Reproducibility Study

This paper presents a systematic reproducibility study of generative recommendation under unified cold-start protocols, covering both new-user and new-item settings.

2026-03-31

Curvature-Guided LoRA: Steering in the pretrained NTK subspace

This paper introduces the prediction alignment problem for parameter-efficient fine-tuning (PEFT), which aims to match the outputs of a LoRA-adapted model to those of full fine-tuning at the function level rather than aligning parameter updates.

2026-03-31

Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses

This paper identifies a systematic robustness overestimation problem in dummy-class-based adversarial defenses (e.

2026-03-31
This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.