Fugu-MT 論文翻訳(概要): Szloca: towards a framework for full 3D tracking through a single camera in context of interactive arts

論文の概要: Szloca: towards a framework for full 3D tracking through a single camera in context of interactive arts

arxiv url: http://arxiv.org/abs/2206.12958v1
Date: Sun, 26 Jun 2022 20:09:47 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-28 17:22:20.382638
Title: Szloca: towards a framework for full 3D tracking through a single camera in context of interactive arts
Title（参考訳）: szloca: インタラクティブアーツの文脈における1台のカメラによるフル3dトラッキングフレームワークに向けて
Authors: Sahaj Garg
Abstract要約: 本研究は、オブジェクト/人のデータと仮想表現を得るための新しい方法と枠組みを提案する。このモデルはコンピュータビジョンシステムの複雑な訓練には依存せず、コンピュータビジョンの研究とz深度を表現する能力が組み合わさっている。
参考スコア（独自算出の注目度）: 1.0878040851638
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Realtime virtual data of objects and human presence in a large area holds a valuable key in enabling many experiences and applications in various industries and with exponential rise in the technological development of artificial intelligence, computer vision has expanded the possibilities of tracking and classifying things through just video inputs, which is also surpassing the limitations of most popular and common hardware setups known traditionally to detect human pose and position, such as low field of view and limited tracking capacity. The benefits of using computer vision in application development is large as it augments traditional input sources (like video streams) and can be integrated in many environments and platforms. In the context of new media interactive arts, based on physical movements and expanding over large areas or gallaries, this research presents a novel way and a framework towards obtaining data and virtual representation of objects/people - such as three-dimensional positions, skeltons/pose and masks from a single rgb camera. Looking at the state of art through some recent developments and building on prior research in the field of computer vision, the paper also proposes an original method to obtain three dimensional position data from monocular images, the model does not rely on complex training of computer vision systems but combines prior computer vision research and adds a capacity to represent z depth, ieto represent a world position in 3 axis from a 2d input source.
Abstract（参考訳）: Realtime virtual data of objects and human presence in a large area holds a valuable key in enabling many experiences and applications in various industries and with exponential rise in the technological development of artificial intelligence, computer vision has expanded the possibilities of tracking and classifying things through just video inputs, which is also surpassing the limitations of most popular and common hardware setups known traditionally to detect human pose and position, such as low field of view and limited tracking capacity. アプリケーション開発にコンピュータビジョンを使用することの利点は、従来の入力ソース(ビデオストリームなど)を拡張し、多くの環境やプラットフォームに統合できるため大きい。新しいメディアインタラクティブアーツの文脈では、物理的な動きに基づいて、広い領域やギャラリーにまたがって拡大し、単一のrgbカメラから3次元の位置、スケルトン/ポス、マスクなど、オブジェクト/人のデータと仮想表現を得るための新しい方法と枠組みを提案する。近年のコンピュータビジョンの分野における先行研究の成果を概観し、モノキュラー画像から3次元の位置データを得るための原型手法を提案する。このモデルはコンピュータビジョンシステムの複雑なトレーニングに頼らず、先行コンピュータビジョン研究とz深度を表す能力を加えることで、2次元入力源から世界の位置を3軸で表現できる。

関連論文リスト

Ascribe New Dimensions to Scientific Data Visualization with VR [1.9084093324993718]
この記事では、Immersive Browsing & Explorationを使ったAutonomous Solutions for Computational ResearchのVRプラットフォームであるASCRIBE-VRを紹介する。 ASCRIBE-VRはマルチモーダル解析、構造評価、没入型可視化を可能にし、X線CT、磁気共鳴、合成3D画像などの高度なデータセットの科学的可視化をサポートする。
論文参考訳（メタデータ） (2025-04-18T03:59:39Z)
Generative AI Framework for 3D Object Generation in Augmented Reality [0.0]
この論文は、最先端の生成AIモデルを統合し、拡張現実(AR)環境で3Dオブジェクトをリアルタイムに作成する。このフレームワークは、ゲーム、教育、小売、インテリアデザインといった業界にまたがる応用を実証している。重要な貢献は、3Dモデル作成の民主化であり、高度なAIツールを幅広い聴衆に利用できるようにする。
論文参考訳（メタデータ） (2025-02-21T17:01:48Z)
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions [0.562479170374811]
本稿では,既存の生成システムを融合してテキストから立体視のバーチャルリアリティービデオを作成する手法を提案する。私たちの研究は、バーチャルリアリティーシミュレーションのような分野において自然言語駆動グラフィックスを使うことのエキサイティングな可能性を強調します。
論文参考訳（メタデータ） (2025-01-02T09:21:03Z)
Diffusion Models in 3D Vision: A Survey [11.116658321394755]
本稿では,3次元視覚タスクの拡散モデルを利用する最先端のアプローチについて概説する。これらのアプローチには、3Dオブジェクト生成、形状補完、点雲再構成、シーン理解が含まれる。本稿では,計算効率の向上,マルチモーダル融合の強化,大規模事前学習の活用などの可能性について論じる。
論文参考訳（メタデータ） (2024-10-07T04:12:23Z)
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image [70.02187124865627]
Open-vocabulary 3D object detection (OV-3DDet) は、新しい3Dシーン内において、目に見えないものの両方をローカライズし、認識することを目的としている。視覚基盤モデルを利用して、3Dシーンにおける新しいクラスを発見するための画像的ガイダンスを提供する。オープン語彙の3Dオブジェクト検出における基礎モデルの可能性を明らかにするとともに,精度と一般化の大幅な向上を示す。
論文参考訳（メタデータ） (2024-07-07T04:50:04Z)
Deep Models for Multi-View 3D Object Recognition: A Review [16.500711021549947]
これまで,オブジェクト認識のための多視点3D表現は,最先端性能を実現する上で最も有望な結果であった。本稿では,3次元分類・検索タスクにおける多視点オブジェクト認識手法の最近の進歩を包括的に紹介する。
論文参考訳（メタデータ） (2024-04-23T16:54:31Z)
Recent Trends in 3D Reconstruction of General Non-Rigid Scenes [104.07781871008186]
コンピュータグラフィックスやコンピュータビジョンにおいて、3次元幾何学、外観、実際のシーンの動きを含む現実世界のモデルの再構築が不可欠である。これは、映画産業やAR/VRアプリケーションに有用な、フォトリアリスティックなノベルビューの合成を可能にする。この最新技術レポート(STAR)は、モノクロおよびマルチビュー入力による最新技術の概要を読者に提供する。
論文参考訳（メタデータ） (2024-03-22T09:46:11Z)
AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
本稿では,3次元オートデコーダをコアとした静的・明瞭な3次元アセットの生成に対して,新しいアプローチを提案する。 3D Autodecoderフレームワークは、ターゲットデータセットから学んだプロパティを潜時空間に埋め込む。次に、適切な中間体積潜在空間を特定し、ロバストな正規化と非正規化演算を導入する。
論文参考訳（メタデータ） (2023-07-07T17:59:14Z)
Multiview Compressive Coding for 3D Reconstruction [77.95706553743626]
単一オブジェクトの3Dポイントやシーン全体で動作するシンプルなフレームワークを紹介します。我々のモデルであるMultiview Compressive Codingは、入力の外観と形状を圧縮して3次元構造を予測する。
論文参考訳（メタデータ） (2023-01-19T18:59:52Z)
State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
モノクル2D画像から変形可能なシーン(または非剛体)の3D再構成は、コンピュータビジョンとグラフィックスの長年、活発に研究されてきた領域である。本研究は,モノクラー映像やモノクラービューの集合から,様々な変形可能な物体や複合シーンを高密度に非剛性で再現するための最先端の手法に焦点を当てる。
論文参考訳（メタデータ） (2022-10-27T17:59:53Z)
3D shape sensing and deep learning-based segmentation of strawberries [5.634825161148484]
農業における形状の3次元認識のためのステレオおよび飛行時間カメラを含む最新のセンシング技術を評価する。本稿では,カメラベースの3Dセンサから得られる情報の組織的性質を利用した,新しい3Dディープニューラルネットワークを提案する。
論文参考訳（メタデータ） (2021-11-26T18:43:10Z)
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D [67.50776195828242]
KITTI-360は、よりリッチな入力モダリティ、包括的なセマンティックインスタンスアノテーション、正確なローカライゼーションを含む郊外の運転データセットである。その結果,150k以上のセマンティクスとインスタンスのアノテート画像と1Bのアノテート3Dポイントが得られた。我々は、同じデータセット上のコンピュータビジョン、グラフィックス、ロボット工学の問題を含む、モバイル知覚に関連するいくつかのタスクのベンチマークとベースラインを構築した。
論文参考訳（メタデータ） (2021-09-28T00:41:29Z)
SAILenv: Learning in Virtual Visual Environments Made Simple [16.979621213790015]
仮想3Dシーンで視覚認識を実験できる新しいプラットフォームを提案する。すべてのアルゴリズムを仮想世界とインターフェースするためには数行のコードが必要であり、非3Dグラフィックの専門家は容易に3D環境自体をカスタマイズできる。我々のフレームワークはピクセルレベルのセマンティクスとインスタンスのラベル付け、深さ、そして私たちの知る限り、それは3Dエンジンから直接受け継がれるモーション関連情報を提供する唯一のものである。
論文参考訳（メタデータ） (2020-07-16T09:50:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。