Fugu-MT 論文翻訳(概要): Characterizing the visual representation of objects from the child's view

論文の概要: Characterizing the visual representation of objects from the child's view

arxiv url: http://arxiv.org/abs/2605.14990v1
Date: Thu, 14 May 2026 15:52:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.926212
Title: Characterizing the visual representation of objects from the child's view
Title（参考訳）: 子どもの視点から物体の視覚的表現を特徴付ける
Authors: Jane Yang, Tarun Sepuri, Alvin Wei Ming Tan, Khai Loong Aw, Michael C. Frank, Bria Long,
Abstract要約: 家庭における幼児の視覚経験をBabyViewデータセットから分析した。その結果, 子どもの対象カテゴリーの露出は, めちゃくちゃであった。カテゴリーの模範は非常に多様で、子どもたちは異常な角度からオブジェクトに遭遇し、散らかったシーンで、部分的には無視されている。
参考スコア（独自算出の注目度）: 2.586100784625842
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Children acquire object category representations from their everyday experiences in the first few years of life. What do the inputs to this learning process look like? We analyzed first-person videos of young children's visual experience at home from the BabyView dataset ($N$ = 31 participants, 868 hours, ages 5--36 months), using a supervised object detection model to extract common object categories from more than 3 million frames. We found that children's object category exposure was highly skewed: a few categories (e.g., cups, chairs) dominated children's visual experiences while most categories appeared rarely, replicating previous findings from a more restricted set of contexts. Category exemplars were highly variable: children encountered objects from unusual angles, in highly cluttered scenes, and partially occluded views; many categories (especially animals) were most frequently viewed as depictions. Surprisingly, despite this variability, detected categories (e.g., giraffes, apples) showed stronger groupings within superordinate categories (e.g., animals, food) relative to groupings derived from canonical photographs of these categories. We found this same pattern when using high-dimensional embeddings from both self-supervised visual and multimodal models; this effect was also recapitulated in densely sampled data from individual children. Understanding the robustness and efficiency of visual category learning will require the development of models that can exploit strong superordinate structure and learn from non-canonical, sparse, and variable exemplars.
Abstract（参考訳）: 子どもたちは、最初の数年間の日々の経験から、オブジェクトカテゴリの表現を取得する。この学習プロセスへのインプットはどのようなものか? 我々は、300万フレーム以上の共通対象カテゴリを抽出するために、教師付き物体検出モデルを用いて、BabyViewデータセット(N$=31、68時間、5～36ヶ月)から家庭における幼児の視覚経験を1対1で分析した。いくつかのカテゴリー(例えば、カップ、椅子)が子供の視覚体験を支配しており、ほとんどのカテゴリーは稀に現れ、より制限された文脈から以前の知見を再現している。カテゴリーの模範は非常に多様で、子どもたちは異常な角度からオブジェクトに遭遇し、非常に散らかったシーンや、部分的に排除されたビューに遭遇し、多くのカテゴリー(特に動物)がしばしば描写として見なされた。この変動にもかかわらず、検出されたカテゴリー(例、キリン、リンゴ)は、これらのカテゴリの標準写真から得られた分類と比較して、上位分類(例、動物、食品)内でより強力なグループ化を示した。自己監督型視覚モデルとマルチモーダルモデルの両方の高次元埋め込みを用いた場合と同様のパターンが得られた。視覚圏学習の堅牢性と効率性を理解するには、強大な超順序構造を活用でき、非標準的、スパース、可変的例から学習できるモデルを開発する必要がある。

関連論文リスト

Assessing the alignment between infants' visual and linguistic experience using multimodal language models [2.275358921334511]
日常学習における子どもの視覚的・言語的経験と時間的整合性について学習のための理想化された整列モーメントは、現代の機械学習データセットと比較して、子供の日常体験において比較的稀であることを示す。これらの結果から, 単語学習を記述したモデルでは, 頻繁なアライメントが制約であることが示唆された。
論文参考訳（メタデータ） (2025-11-24T06:58:16Z)
A solution to generalized learning from small training sets found in everyday infant experiences [6.323444741009534]
乳幼児14例(7～11カ月)の自我中心画像の解析日常的な視覚入力は, 比較的類似した画像のクラスタが, より稀な, より可変な画像と交差する, 粗い類似構造を示す。実験により、この構造を機械で模倣することで、機械学習の小さなデータセットからの一般化が向上することが示された。
論文参考訳（メタデータ） (2025-10-16T18:21:55Z)
Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
我々は、物体の形状に関するゼロショット視覚的推論を必要とする認知科学の実験的設計を活用する。我々は500人以上の参加者から行動データの35万件の試行を収集した。次に、一般的な視覚モデルの性能を評価する。
論文参考訳（メタデータ） (2024-09-09T17:59:13Z)
Learning high-level visual representations from a child's perspective without strong inductive biases [21.466000613898988]
我々は、子どもの視覚体験のリアルなプロキシ上で、明示的な監督なしに最先端のニューラルネットワークを訓練する。埋め込みモデルと生成モデルの両方を、1人の子供から200時間のヘッドカムビデオでトレーニングします。同じデータで訓練された生成モデルは、部分的にマスキングされたオブジェクトの単純な性質を外挿することに成功しました。
論文参考訳（メタデータ） (2023-05-24T17:26:59Z)
Embodied vision for learning object representations [4.211128681972148]
幼児の視覚的統計は、親しみやすい環境と新しい環境の両方において、物体認識の精度を向上させる。この効果は、背景から抽出した特徴の減少、画像中の大きな特徴に対するニューラルネットワークバイアス、新奇な背景領域と慣れ親しんだ背景領域との類似性の向上によるものである、と我々は主張する。
論文参考訳（メタデータ） (2022-05-12T16:36:27Z)
Unsupervised Object Learning via Common Fate [61.14802390241075]
ビデオから生成オブジェクトモデルを学習することは、長い問題であり、因果的シーンモデリングに必要である。この問題を3つの簡単なサブタスクに分解し、それぞれに候補解を提供する。提案手法は,入力ビデオのオクルージョンを超えて一般化された生成モデルを学習することができることを示す。
論文参考訳（メタデータ） (2021-10-13T08:22:04Z)
Closing the Generalization Gap in One-Shot Object Detection [92.82028853413516]
強力な数ショット検出モデルの鍵は、高度なメトリック学習アプローチではなく、カテゴリの数をスケールすることにある。将来的なデータアノテーションの取り組みは、より広範なデータセットに集中し、より多くのカテゴリにアノテートする必要がある。
論文参考訳（メタデータ） (2020-11-09T09:31:17Z)
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions [50.435861435121915]
視覚のみの表現よりも優れた表現を学べるかどうかを調べるために,人間のインタラクションとアテンション・キューを用いている。実験の結果,我々の「音楽監督型」表現は,視覚のみの最先端手法であるMoCoよりも優れていた。
論文参考訳（メタデータ） (2020-10-16T17:46:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。