BlendScape: Enabling Unified and Personalized Video-Conferencing Environments through Generative AI
- URL: http://arxiv.org/abs/2403.13947v1
- Date: Wed, 20 Mar 2024 19:41:05 GMT
- Title: BlendScape: Enabling Unified and Personalized Video-Conferencing Environments through Generative AI
- Authors: Shwetha Rajaram, Nels Numan, Balasaravanan Thoravi Kumaravel, Nicolai Marquardt, Andrew D. Wilson,
- Abstract summary: BlendScape is a system for meeting participants to compose video-conferencing environments tailored to their collaboration context.
BlendScape supports flexible representations of task spaces by blending users' physical or virtual backgrounds into unified environments.
- Score: 19.06858242647237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization, we developed BlendScape, a system for meeting participants to compose video-conferencing environments tailored to their collaboration context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or virtual backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an evaluation with 15 end-users, we investigated their customization preferences for work and social scenarios. Participants could rapidly express their design intentions with BlendScape and envisioned using the system to structure collaboration in future meetings, but experienced challenges with preventing distracting elements. We implement scenarios to demonstrate BlendScape's expressiveness in supporting distributed collaboration techniques from prior work and propose composition techniques to improve the quality of environments.
Related papers
- LEGENT: Open Platform for Embodied Agents [60.71847900126832]
We introduce LEGENT, an open, scalable platform for developing embodied agents using Large Language Models (LLMs) and Large Multimodal Models (LMMs)
LEGENT offers a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface.
In experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks.
arXiv Detail & Related papers (2024-04-28T16:50:12Z) - Modular Customizable ROS-Based Framework for Rapid Development of Social
Robots [3.6622737533847936]
We present the Socially-interactive Robot Software platform (SROS), an open-source framework addressing this need through a modular layered architecture.
Specialized perceptual and interactive skills are implemented as ROS services for reusable deployment on any robot.
We experimentally validated core SROS technologies including computer vision, speech processing, and GPT2 autocomplete speech implemented as plug-and-play ROS services.
arXiv Detail & Related papers (2023-11-27T12:54:20Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Enhancing Graph Representation of the Environment through Local and
Cloud Computation [2.9465623430708905]
We propose a graph-based representation that provides a semantic representation of robot environments from multiple sources.
To acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services.
The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment.
arXiv Detail & Related papers (2023-09-22T08:05:32Z) - Learning Environment-Aware Affordance for 3D Articulated Object
Manipulation under Occlusions [9.400505355134728]
We propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints.
We introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations.
arXiv Detail & Related papers (2023-09-14T08:24:32Z) - InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [52.981128371910266]
We present InstructDiffusion, a framework for aligning computer vision tasks with human instructions.
InstructDiffusion could handle a variety of vision tasks, including understanding tasks and generative tasks.
It even exhibits the ability to handle unseen tasks and outperforms prior methods on novel datasets.
arXiv Detail & Related papers (2023-09-07T17:56:57Z) - Mutual Scene Synthesis for Mixed Reality Telepresence [4.504833177846264]
Mixed reality telepresence allows participants to engage in a wide spectrum of activities, previously not possible in 2D screen-based communication methods.
We propose a novel mutual scene synthesis method that takes the participants' spaces as input, and generates a virtual synthetic scene that corresponds to the functional features of all participants' local spaces.
Our method combines a mutual function optimization module with a deep-learning conditional scene augmentation process to generate a scene mutually and physically accessible to all participants of a mixed reality telepresence scenario.
arXiv Detail & Related papers (2022-04-01T02:08:11Z) - Composing Complex and Hybrid AI Solutions [52.00820391621739]
We describe an extension of the Acumos system towards enabling the above features for general AI applications.
Our extensions include support for more generic components with gRPC/Protobuf interfaces.
We provide examples of deployable solutions and their interfaces.
arXiv Detail & Related papers (2022-02-25T08:57:06Z) - A Survey on Synchronous Augmented, Virtual and Mixed Reality Remote
Collaboration Systems [81.0723729946659]
The focus of this work is clearly on synchronised collaboration from a distance.
A total of 82 unique systems for remote collaboration are discussed, including more than 100 publications and 25 commercial systems.
arXiv Detail & Related papers (2021-02-11T13:33:51Z) - AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z) - Designing Interaction for Multi-agent Cooperative System in an Office
Environment [2.2430284460908605]
Future intelligent system will involve very various types of artificial agents, such as mobile robots, smart home infrastructure or personal devices.
This paper presents the design and implementation of the human-machine interface of Intelligent Cyber-Physical system (ICPS)
ICPS is a multi-entity coordination system of robots and other smart devices in a working environment.
arXiv Detail & Related papers (2020-02-15T17:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.