BlendScape: Enabling Unified and Personalized Video-Conferencing Environments through Generative AI
- URL: http://arxiv.org/abs/2403.13947v1
- Date: Wed, 20 Mar 2024 19:41:05 GMT
- Title: BlendScape: Enabling Unified and Personalized Video-Conferencing Environments through Generative AI
- Authors: Shwetha Rajaram, Nels Numan, Balasaravanan Thoravi Kumaravel, Nicolai Marquardt, Andrew D. Wilson,
- Abstract summary: BlendScape is a system for meeting participants to compose video-conferencing environments tailored to their collaboration context.
BlendScape supports flexible representations of task spaces by blending users' physical or virtual backgrounds into unified environments.
- Score: 19.06858242647237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization, we developed BlendScape, a system for meeting participants to compose video-conferencing environments tailored to their collaboration context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or virtual backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an evaluation with 15 end-users, we investigated their customization preferences for work and social scenarios. Participants could rapidly express their design intentions with BlendScape and envisioned using the system to structure collaboration in future meetings, but experienced challenges with preventing distracting elements. We implement scenarios to demonstrate BlendScape's expressiveness in supporting distributed collaboration techniques from prior work and propose composition techniques to improve the quality of environments.
Related papers
- Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation [62.682428307810525]
We introduce AVS-Mamba, a selective state space model to address the audio-visual segmentation task.
Our framework incorporates two key components for video understanding and cross-modal learning.
Our approach achieves new state-of-the-art results on the AVSBench-object and AVS-semantic datasets.
arXiv Detail & Related papers (2025-01-14T03:20:20Z) - CFSynthesis: Controllable and Free-view 3D Human Video Synthesis [57.561237409603066]
CFSynthesis is a novel framework for generating high-quality human videos with customizable attributes.
Our method leverages a texture-SMPL-based representation to ensure consistent and stable character appearances across free viewpoints.
Results on multiple datasets show that CFSynthesis achieves state-of-the-art performance in complex human animations.
arXiv Detail & Related papers (2024-12-15T05:57:36Z) - Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability [0.0]
Generative AI emerges as a key driver in reshaping user interfaces.
This paper explores the integration of generative AI in modern user interfaces.
It focuses on multimodal interaction, cross-platform adaptability and dynamic personalization.
arXiv Detail & Related papers (2024-11-15T14:49:58Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending [19.06858242647237]
We introduce SpaceBlender, a pipeline that transforms user-provided 2D images into context-rich 3D environments.
Participants appreciated the enhanced familiarity and context provided by SpaceBlender but noted complexities in the generative environments.
We propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.
arXiv Detail & Related papers (2024-09-20T22:27:31Z) - Enhancing Graph Representation of the Environment through Local and
Cloud Computation [2.9465623430708905]
We propose a graph-based representation that provides a semantic representation of robot environments from multiple sources.
To acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services.
The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment.
arXiv Detail & Related papers (2023-09-22T08:05:32Z) - Mutual Scene Synthesis for Mixed Reality Telepresence [4.504833177846264]
Mixed reality telepresence allows participants to engage in a wide spectrum of activities, previously not possible in 2D screen-based communication methods.
We propose a novel mutual scene synthesis method that takes the participants' spaces as input, and generates a virtual synthetic scene that corresponds to the functional features of all participants' local spaces.
Our method combines a mutual function optimization module with a deep-learning conditional scene augmentation process to generate a scene mutually and physically accessible to all participants of a mixed reality telepresence scenario.
arXiv Detail & Related papers (2022-04-01T02:08:11Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Composing Complex and Hybrid AI Solutions [52.00820391621739]
We describe an extension of the Acumos system towards enabling the above features for general AI applications.
Our extensions include support for more generic components with gRPC/Protobuf interfaces.
We provide examples of deployable solutions and their interfaces.
arXiv Detail & Related papers (2022-02-25T08:57:06Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z) - SceneGen: Generative Contextual Scene Augmentation using Scene Graph
Priors [3.1969855247377827]
We introduce SceneGen, a generative contextual augmentation framework that predicts virtual object positions and orientations within existing scenes.
SceneGen takes a semantically segmented scene as input, and outputs positional and orientational probability maps for placing virtual content.
We formulate a novel spatial Scene Graph representation, which encapsulates explicit topological properties between objects, object groups, and the room.
To demonstrate our system in action, we develop an Augmented Reality application, in which objects can be contextually augmented in real-time.
arXiv Detail & Related papers (2020-09-25T18:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.