Fugu-MT 論文翻訳(概要): FOVI: A biologically-inspired foveated interface for deep vision models

論文の概要: FOVI: A biologically-inspired foveated interface for deep vision models

arxiv url: http://arxiv.org/abs/2602.03766v1
Date: Tue, 03 Feb 2026 17:26:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.602928
Title: FOVI: A biologically-inspired foveated interface for deep vision models
Title（参考訳）: FOVI:ディープビジョンモデルのための生物学的にインスパイアされたインタフェース
Authors: Nicholas M. Blauch, George A. Alvarez, Talia Konkle,
Abstract要約: 本研究では,ヒト網膜と一次視覚野をベースとした視覚インタフェースを提案する。受容場は、センサ多様体上のk-アネレスト近傍(kNN)として定義される。本稿では,(1)エンドツーエンドのkNN-畳み込みアーキテクチャ,(2)基礎となるDINOv3 ViTモデルのファベレーテッド適応の2つのユースケースを示す。
参考スコア（独自算出の注目度）: 5.6075902312642745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human vision is foveated, with variable resolution peaking at the center of a large field of view; this reflects an efficient trade-off for active sensing, allowing eye-movements to bring different parts of the world into focus with other parts of the world in context. In contrast, most computer vision systems encode the visual world at a uniform resolution, raising challenges for processing full-field high-resolution images efficiently. We propose a foveated vision interface (FOVI) based on the human retina and primary visual cortex, that reformats a variable-resolution retina-like sensor array into a uniformly dense, V1-like sensor manifold. Receptive fields are defined as k-nearest-neighborhoods (kNNs) on the sensor manifold, enabling kNN-convolution via a novel kernel mapping technique. We demonstrate two use cases: (1) an end-to-end kNN-convolutional architecture, and (2) a foveated adaptation of the foundational DINOv3 ViT model, leveraging low-rank adaptation (LoRA). These models provide competitive performance at a fraction of the computational cost of non-foveated baselines, opening pathways for efficient and scalable active sensing for high-resolution egocentric vision. Code and pre-trained models are available at https://github.com/nblauch/fovi and https://huggingface.co/fovi-pytorch.
Abstract（参考訳）: これは、アクティブセンシングのための効率的なトレードオフを反映しており、眼球運動は世界の異なる部分を文脈において世界の他の部分に焦点を合わせることを可能にする。対照的に、ほとんどのコンピュータビジョンシステムは、一様解像度で視覚世界をエンコードし、フルフィールドの高解像度画像を効率的に処理するための課題を提起する。本稿では,ヒト網膜と一次視覚野をベースとしたFoveated Vision Interface(FOVI)を提案し,可変解像度網膜様センサアレイを一様密度のV1様センサ多様体に再構成する。受容場は、センサ多様体上のk-nearest-neighborhood(kNN)として定義され、新しいカーネルマッピング技術を通じてkNN-convolutionを可能にする。本稿では,(1)エンドツーエンドのkNN-畳み込みアーキテクチャ,(2)低ランク適応(LoRA)を利用した基礎的DINOv3 ViTモデルのファベレート適応,という2つのユースケースを示す。これらのモデルは、高解像度の自我中心視のための効率的でスケーラブルな能動センシングのための開口経路として、非探索ベースラインの計算コストのごく一部で競争性能を提供する。コードと事前トレーニングされたモデルは、https://github.com/nblauch/foviとhttps://huggingface.co/fovi-pytorchで入手できる。

論文の概要: FOVI: A biologically-inspired foveated interface for deep vision models

関連論文リスト