FuguReport

ページ基準日: 2026-06-22

次へ (2026-06-15)

Weekly

2026-06-12 - 2026-06-18

テーマ 1

効率的推論LLM

本テーマは、推論能力の向上を純粋なスケーリング問題として扱うのではなく、推論指向LLMの訓練と推論の両面での効率化に焦点を当てている。

テーマ 2

エゴセントリック・行動動画における時間的推論

今週のテーマは、特に行動認識やエゴセントリック（一人称視点）の設定において、動画モデルがより強力な時間的推論に向けてどのように評価・再設計されているかに焦点を当てている。

テーマ 3

画像編集ベンチマーク

本テーマは、指示ベースの画像編集に対する新たなベンチマークおよび評価フレームワークに焦点を当てており、視覚生成技術の進歩と信頼性の高い編集評価との間のギャップが動機となっている。

Daily

最新の日次レポート

50 件のレポート

2026-06-22 Method / Mesh Generation / Direct triangle-based mesh synthesis

MeshFlow: Mesh Generation with Equivariant Flow Matching

MeshFlowは、中間的な形状表現や自己回帰的なシリアル化に依存せず、連続的なトライアングルスープとして直接ポリゴンメッシュを生成する生成モデルです。

2026-06-22 Method / Reinforcement Learning / Iterative framework for mathematical reasoning

VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct

本論文は、マルチモーダルな数学的推論のための反復的フレームワークであるVeriEvolを提案している。

2026-06-22 Method / Preference Modeling / Linking collaborative preference and control

TailorMind: Towards Preference-Aligned Multimodal Content Generation

本論文は、パーソナライズされたマルチモーダルコンテンツ生成を研究し、適切なユーザー生成コンテンツ（UGC）が利用できない場合に、ユーザーに合わせた画像テキストの投稿や動画を作成することを目的としている。

2026-06-22 Method / Decoding / Scheduling order learning in diffusion LM

Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models

本論文は、マスク型拡散言語モデルにおけるトークンのアンマスク順序が生成品質に与える影響を研究し、その順序を固定のヒューリスティックではなく「思考の順序」として扱っています。

2026-06-22 Method / Model Scaling / Test time scaling framework

TEXEDO : Test Time Scaling for Controller-aware Language-conditioned Humanoid Motion Generation

Texedoは、言語条件付きヒューマノイドモーション生成のためのテスト時スケーリング（test-time scaling）フレームワークであり、基礎となる生成器や全身トラッカーを変更することなく、展開可能なモーションの品質を向上させます。

2026-06-21 Method / Formal Verification / Guided coding with formal methods

Formal-Method-Guided Vibe Coding: Closing the Verification Loop on AI-Generated Safety-Critical Software Through Model-Driven Engineering

本論文は、安全クリティカルなシステム向けにAIが生成したJavaコードを、既存のモデル駆動エンジニアリングのワークフロー内で形式的検証可能にするための閉ループパイプライン「Forge」を提案している。

2026-06-21 Method / Representation Learning / Consistent data manifold representation

Encoder-Decoder Manifold Alignment for Idempotent Generation

本論文は、エンコーダ・デコーダ型生成モデルにおける冪等性（idempotency）を、データ空間の生成特性としてではなく、表現の特性として研究しています。

2026-06-21 Method / Reinforcement Learning / Post-training policy optimization

PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models

PolicyTrimは、推論遅延の側面からだけでなく、ポリシーの側面から視覚・言語・行動（VLA）モデルの展開効率を研究している。

2026-06-21 Method / Causal Reasoning / Causal-aware framework for reasoning

ARIA: A Causal-Aware Framework for Rescuing LLM Reasoning in Trustworthy Materials Discovery

本論文は、プロセス-構造-特性（PSP）の枠組みを通じた材料探索のためのLLMの推論を研究し、単純な知識グラフ拡張が因果的に不完全な証拠への過剰な固着を引き起こす「文脈のトンネリング（contextual tunneling）」という失敗モードを特定している。

2026-06-21 Method / Mixture-of-Experts / Heterogeneous MoE architectures

Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search

本論文は、LEMURエコシステム内において、4つの多様なエキスパートからなる混合エキスパート（MoE4）ビジョンモデルを体系的に生成・評価するための自動化パイプラインを提示している。

2026-06-20 Method / 3D Shape Reconstruction / End-to-end articulated object reconstruction

Artic-O: End-to-End Articulated Object Reconstruction via Latent Geometry Learning

Artic-Oは、スパースな複数状態の画像から関節を持つ物体を再構成するための、エンドツーエンドのフィードフォワード手法である。

2026-06-20 Method / Data Generation / Scalable egocentric hand manipulation data

Wh0: Generative World Models as Scalable Sources of Egocentric Human Hand Manipulation Data

Wh0は、生成ビデオ世界モデルを用いて、ロボットのポリシー適応に向けたスケーラブルで制御可能な一人称視点の人間の手による操作データを合成するフレームワークである。

2026-06-20 Method / Reinforcement Learning / Algorithmic stages of RL with LLMs

Modularized Reinforcement Learning on LLMs: From MDP Creation to Exploration and Learning

本サーベイは、特定の応用分野や主要なアルゴリズムに焦点を当てるのではなく、モジュール化された強化学習（RL）の視点から大規模言語モデル（LLM）の学習に向けたRLを考察しています。

2026-06-20 Method / ASR Enhancement / Extending models for code-switching

Adding Robust Code-Switching Capabilities to High Performance Multilingual ASR

本論文は、既存の単一言語の性能を損なうことなく、すでに強力な多言語音声認識モデルにコードスイッチングASR機能を追加する方法について研究している。

2026-06-20 Method / Dialogue Personalization / Emotional support response generation

MindTailor: Personalized Emotional Support via Post History-Grounded Case Formulation and Collaborative Refinement

MindTailorは、相談者の過去のソーシャルメディア投稿を利用して構造化されたケースフォーミュレーション（事例論立て）を構築し、マルチエージェントの批評を通じて応答の草稿を洗練させることで、個別に調整された感情的サポート応答を生成するフレームワークです。

2026-06-20 Evaluation / Representation Learning / Measuring memorization in multi-modal models

MultiMem: Measuring and Mitigating Memorization in Multi-Modal Contrastive Learninga

本論文はマルチモーダル対照学習における暗記（memorization）を研究し、leave-one-out設定を用いて任意のモダリティの組み合わせ間で暗記を定量化するために設計された指標「MultiMem」を提案している。

2026-06-19 Method / 3D Reconstruction / Scale-consistent one-pass estimation

SCOPE: Scale-Consistent One-Pass Estimation of 3D Geometry

SCOPEは、幾何学的な精度と長距離の時間的整合性の両方を維持しながら、長時間の単眼ビデオシーケンスから3Dジオメトリを推定する手法である。

2026-06-19 Evaluation / User Perception Studies / Impact of warning labels on AI perception

Warning labels shift perceptions of sycophantic AI, but not its influence

本論文は、2,610人の参加者を対象とした事前登録実験において、警告ラベルが追従的（シコファンティック）なAIの影響を軽減できるかを評価している。

2026-06-19 Method / Policy Improvement / Autonomous policy enhancement via dynamics

Robot Self-Improvement via Human-Video Dynamics Models

本論文は、人間の受動的な動画から学習した事前知識が、ロボットのポリシー初期化だけでなく、展開後の自律的な自己改善をサポートできるかを調査しています。

2026-06-19 Method / Critic Optimization / Fine-tuning critics via rollouts

Robot Critics that Sweat the Small Stuff

本論文は、わずかな位置ずれや接触ミスなどの微細な視覚的詳細によって成功と失敗が分かれる閉ループのロボット操作において、視覚言語モデルのクリティック（評価器）をどのように役立てるかを研究している。

2026-06-19 Method / Quantum Measurement / Verification framework using local states

Efficient Verification of Entangled Measurements with Local States

本論文は、エンタングルしたフォン・ノイマン測定をテストするために局所的な直積状態の準備のみを使用する、量子測定検証（QMV）のフレームワークを構築している。

2026-06-18 Method / Spatial Reasoning / Scene-centered perception beyond frame-centered

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

S-Agentは、単一のフレームではなく、連続したマルチビュー画像や動画に対する推論を行うために設計された空間的ツール使用フレームワークである。

2026-06-18 Evaluation / Model Evaluation / FID variance measurement

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

本論文では、Fréchet Inception Distance (FID) を固定されたスコアではなく確率変数として捉え、学習シードとサンプリングシードの2軸パネルを用いて調査しています。

2026-06-18 Method / Token Reduction / Spatial unit reduction reconstruction

Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models

本論文は、既存のトークン削減手法が視覚的Mambaモデル（特にVMambaのような構造強化バリアント）においてなぜうまく機能しないのかを調査している。

2026-06-18 Method / Vision-Language Fusion / Event-driven evidence memory framework

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

EventVLAは、タスクに関連する視覚的証拠が一時的に現れた後に隠れてしまうような非マルコフ的なロボット操作を対象とした、長期的な視覚-言語-行動（VLA）ポリシーのためのエンドツーエンドのフレームワークである。

2026-06-18 Method / Agentic Robotics / Self-improving robot policy framework

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

本論文は、プログラミングエージェントが実世界の閉じたフィードバックループを通じてロボットのポリシーを直接改善できるフレームワーク「ENPIRE」を提案しています。

2026-06-17 Method / 3D Representation / Disentangled neural mesh implicit field

NeuMesh++: Towards Versatile and Efficient Volumetric Editing with Disentangled Neural Mesh-based Implicit Field

NeuMesh++は、メッシュベースの神経放射場（Neural Radiance Field）を導入し、各メッシュ頂点に分離された形状、テクスチャ、セマンティックコード、および編集用の変更色を保持します。

2026-06-17 Evaluation / Benchmarking / Systematic benchmark audit

Physics-IQ Verified

本論文は、生成された動画を実際の物理実験の記録と比較することで、動画生成モデルの物理的理解を評価するために使用されるPhysics-IQベンチマークの体系的な監査を提示する。

2026-06-17 Method / Training Enhancement / Two-stage iterative framework

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

本論文は、段階的な修正によるテスト時スケーリングを研究し、標準的な単一試行の学習目的関数が多段階の推論ダイナミクスと合致していないことを論じています。

2026-06-17 Method / Dexterous Manipulation / Estimating hand-object interactions

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

本論文は、人間の手と物体の相互作用を捉えた日常的な単眼RGBビデオを、多指ロボットの巧みな（デクスタラス）操作軌道に変換する2段階のパイプライン「Do as I Do」を提案している。

2026-06-17 Method / Model Merging / Merging fine-tuned task models into multitask

PACT: Preserving Anchored Cores in Task-vectors for Model Merging

本論文は、タスク固有の知識がファインチューニングの更新分に完全に表現されていると通常仮定する、タスクベクトルベースのモデルマージの限界について調査している。

2026-06-16 Task / Game Generation / Generating playable games in engine

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

GameCraft-Benchは、エンドツーエンドのゲーム生成を、孤立したコードやアセットの生成ではなく、実際のゲームエンジン内で完全にプレイ可能なゲームの成果物を作成するものとして定式化している。

2026-06-16 Method / Module Learning / Learning atomic modules from traces

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

本論文は、教師あり微調整（SFT）の後に強化学習（RL）を組み合わせた事後学習が、なぜ言語モデルの推論汎化を向上させるかについて研究している。

2026-06-16 Method / Multi-Agent Learning / Divide, Deliberate, Decide framework

Divide, Deliberate, Decide: A Multi-Agent Framework for Fine-Grained Egocentric Action Recognition

本論文は、一人称視点ビデオにおける詳細な行動認識に取り組んでいる。

2026-06-16 Method / World Modeling / Iterative latent environment refinement

Looped World Models

本論文は、ループ型Transformerアーキテクチャを世界モデルに適用した初の事例として提案される、Looped World Models (LoopWM) を紹介しています。

2026-06-16 Method / Agentic Reinforcement Learning / Learning with environment dynamics

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

本論文は、LLMに基づくエージェント向けの強化学習（RL）フレームワークであるEnvRLを提案している。

2026-06-15 Method / Dexterous Manipulation / Tactile-reactive manipulation tasks

T-Rex: Tactile-Reactive Dexterous Manipulation

T-Rexは、大規模な人間の自己中心視点による事前学習と、触覚に基づくロボットの中間学習を組み合わせた、触覚反応型の巧緻操作フレームワークです。

2026-06-15 Method / Policy Learning / Language-conditioned robot manipulation policy

Geometric Action Model for Robot Policy Learning

本論文は、事前学習済みの幾何学基盤モデルを認識、時間的予測、および行動デコードのための共有バックボーンとして再利用する、言語条件付きのロボット操作方策であるGeometric Action Model (GAM) を提案している。

2026-06-15 Method / Multimodal Agents / Real-time personalized agent design

VisualClaw: A Real-Time, Personalized Agent for the Physical World

VisualClawは、ストリーミング配信される物理世界環境において、リアルタイムかつパーソナライズされた利用を目的として設計されたマルチモーダルエージェントです。

2026-06-15 Method / Neuro-Symbolic Reasoning / Traceable evidence graph construction

VeriGraph: Towards Verifiable Data-Analytic Agents

本論文は、LLMに基づくデータ分析エージェントの検証可能性を向上させる方法を研究し、標準的な線形の思考-行動トレースでは、生データ、計算、最終的な主張の間の監査可能なリンクが保持されないと主張している。

2026-06-15 Method / Visuomotor Control / Transformer architecture for visuomotor policies

Scaling Short-Term Memory of Visuomotor Policies for Long-Horizon Tasks

本論文では、部分的な観測環境下での長期的なロボットタスクにおいて短期記憶を活用するために設計された、Transformerベースの視覚運動ポリシーアーキテクチャであるPRISMを提案している。

2026-06-14 Method / Active Learning / Data selection via low-rank approximation

Active Learning with Low-Rank Structure for Data Selection

本論文は、データセットの構造をクラスタリングの観点からではなく、低ランク構造の観点から捉えたバッチデータ選択について研究しています。

2026-06-14 Evaluation / Benchmarking / Simulator-based emotion management scenarios

EIBench: A Simulator-Based Benchmark and Turn-Credit RL for Emotion Management

本論文は、複数ターンの対話におけるインタラクティブな感情管理のためのシミュレータベースのベンチマークであるEIBenchを紹介する。

2026-06-14 Method / Self-Supervised Learning / Joint visual and motion encoder training

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

本論文では、近い未来は過去から予測可能であるという仮定に基づく自己教師あり動画像表現学習手法であるTemporal Difference in Vision (TDV)を提案している。

2026-06-14 Evaluation / Benchmarking / Nationwide wildfire IA failure prediction

A Nationwide Benchmark for Wildfire Initial Attack Failure Prediction with Public Environmental Data

本論文では、火災発見時またはそれ以前に利用可能な公開データのみを使用して、山火事の初期消火が失敗するかどうかを予測するための米国全国規模のベンチマークであるWildfireIAを紹介する。

2026-06-14 Task / Robot Control / Sequential manipulation tasks

Learning New Tasks via Reusable Skills: Skill-Compositional Experts for Embodied Continual Learning

本論文では、ロボットが閉ループ制御下で過去に学習した行動を失うことなく、新しい操作タスクを継続的に獲得しなければならない、身体化された継続学習（embodied continual learning）について研究している。

2026-06-13 Method / Video Generation / Autoregressive streaming framework

GeoStream: Toward Precise Camera Controlled Streaming Video Generation

GeoStreamは、自己回帰型設定において正確なメトリックスケールの視点制御を目的とした、カメラ制御型ストリーミング動画生成フレームワークです。

2026-06-13 Method / Data Selection / Optimizing diverse data subsets

Spokes: Optimizing for Diverse Pretraining Data Selection

本論文は、勾配空間におけるG-Vendi多様性スコアを直接最適化することで、多様な事前学習サブセットを選択するスケーラブルな手法「Spokes」を提案しています。

2026-06-13 Method / Diffusion Models / Non-uniform timestep scheduling

Timestep Rescheduling in Diffusion Inversion

本論文は、決定論的拡散反転において、拡散タイムステップの選択が反転の忠実度にどのような影響を与えるかを研究している。

2026-06-13 Method / Reward Learning / Handling diversity collapse with reward optimization

Understanding Diversity Collapse in RLVR via the Lens of Overtraining

本論文は、検証可能な報酬を用いた強化学習（RLVR）における多様性の崩壊（学習中にPass@1は向上するが、k値の高いPass@kはしばしば悪化する現象）を分析している。

ページ基準日: 2026-06-22

次へ (2026-06-15)

FuguReport

2026-06-12 - 2026-06-18

効率的推論LLM

エゴセントリック・行動動画における時間的推論

画像編集ベンチマーク

最新の日次レポート

MeshFlow: Mesh Generation with Equivariant Flow Matching

VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct

TailorMind: Towards Preference-Aligned Multimodal Content Generation

Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models

TEXEDO : Test Time Scaling for Controller-aware Language-conditioned Humanoid Motion Generation

Formal-Method-Guided Vibe Coding: Closing the Verification Loop on AI-Generated Safety-Critical Software Through Model-Driven Engineering

Encoder-Decoder Manifold Alignment for Idempotent Generation

PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models

ARIA: A Causal-Aware Framework for Rescuing LLM Reasoning in Trustworthy Materials Discovery

Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search

Artic-O: End-to-End Articulated Object Reconstruction via Latent Geometry Learning

Wh0: Generative World Models as Scalable Sources of Egocentric Human Hand Manipulation Data

Modularized Reinforcement Learning on LLMs: From MDP Creation to Exploration and Learning

Adding Robust Code-Switching Capabilities to High Performance Multilingual ASR

MindTailor: Personalized Emotional Support via Post History-Grounded Case Formulation and Collaborative Refinement

MultiMem: Measuring and Mitigating Memorization in Multi-Modal Contrastive Learninga

SCOPE: Scale-Consistent One-Pass Estimation of 3D Geometry

Warning labels shift perceptions of sycophantic AI, but not its influence

Robot Self-Improvement via Human-Video Dynamics Models

Robot Critics that Sweat the Small Stuff

Efficient Verification of Entangled Measurements with Local States

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models

EventVLA: Event-Driven Visual Evidence Memory for Long-Horizon Vision-Language-Action Policies

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

NeuMesh++: Towards Versatile and Efficient Volumetric Editing with Disentangled Neural Mesh-based Implicit Field

Physics-IQ Verified

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

PACT: Preserving Anchored Cores in Task-vectors for Model Merging

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

Divide, Deliberate, Decide: A Multi-Agent Framework for Fine-Grained Egocentric Action Recognition

Looped World Models

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

T-Rex: Tactile-Reactive Dexterous Manipulation

Geometric Action Model for Robot Policy Learning

VisualClaw: A Real-Time, Personalized Agent for the Physical World

VeriGraph: Towards Verifiable Data-Analytic Agents

Scaling Short-Term Memory of Visuomotor Policies for Long-Horizon Tasks

Active Learning with Low-Rank Structure for Data Selection

EIBench: A Simulator-Based Benchmark and Turn-Credit RL for Emotion Management

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

A Nationwide Benchmark for Wildfire Initial Attack Failure Prediction with Public Environmental Data

Learning New Tasks via Reusable Skills: Skill-Compositional Experts for Embodied Continual Learning

GeoStream: Toward Precise Camera Controlled Streaming Video Generation

Spokes: Optimizing for Diverse Pretraining Data Selection

Timestep Rescheduling in Diffusion Inversion

Understanding Diversity Collapse in RLVR via the Lens of Overtraining

アーカイブ

週次アーカイブ

効率的推論LLM

エゴセントリック・行動動画における時間的推論

画像編集ベンチマーク

LLM研究エージェントの評価

構造化ワールドモデル

制御可能でスケーラブルなモデルマージング

身体化ワールドモデルと評価

AIガバナンスと安全性

LLMのエージェント型推論評価

推薦システムへの強化学習の適用

整合的視覚表現

視覚言語ナビゲーションにおける空間推論と不確実性

LLM共同研究者の評価

身体性VLMのための構造的表現

構造化された効率的な拡散モデル編集