AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People
- URL: http://arxiv.org/abs/2408.10240v1
- Date: Mon, 5 Aug 2024 01:47:36 GMT
- Title: AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People
- Authors: Seonghee Lee, Maho Kohga, Steve Landau, Sile O'Modhrain, Hari Subramonyam,
- Abstract summary: People with visual impairments often struggle to create content that relies heavily on visual elements.
Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork.
Our work integrates generative AI with a constructive approach that provides users with enhanced control and editing capabilities.
- Score: 4.41462357579624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: People with visual impairments often struggle to create content that relies heavily on visual elements, particularly when conveying spatial and structural information. Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork. On the other hand, emerging generative AI-based text-to-image tools can produce expressive illustrations from descriptions in natural language, but they lack precise control over image composition and properties. To address this gap, our work integrates generative AI with a constructive approach that provides users with enhanced control and editing capabilities. Our system, AltCanvas, features a tile-based interface enabling users to construct visual scenes incrementally, with each tile representing an object within the scene. Users can add, edit, move, and arrange objects while receiving speech and audio feedback. Once completed, the scene can be rendered as a color illustration or as a vector for tactile graphic generation. Involving 14 blind or low-vision users in design and evaluation, we found that participants effectively used the AltCanvas workflow to create illustrations.
Related papers
- PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions [66.92809850624118]
PixWizard is an image-to-image visual assistant designed for image generation, manipulation, and translation based on free-from language instructions.
We tackle a variety of vision tasks into a unified image-text-to-image generation framework and curate an Omni Pixel-to-Pixel Instruction-Tuning dataset.
Our experiments demonstrate that PixWizard not only shows impressive generative and understanding abilities for images with diverse resolutions but also exhibits promising generalization capabilities with unseen tasks and human instructions.
arXiv Detail & Related papers (2024-09-23T17:59:46Z) - Alfie: Democratising RGBA Image Generation With No $$$ [33.334956022229846]
We propose a fully-automated approach for obtaining RGBA illustrations by modifying the inference-time behavior of a pre-trained Diffusion Transformer model.
We force the generation of entire subjects without sharp croppings, whose background is easily removed for seamless integration into design projects or artistic scenes.
arXiv Detail & Related papers (2024-08-27T07:13:44Z) - Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER)
IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose.
We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z) - Block and Detail: Scaffolding Sketch-to-Image Generation [65.56590359051634]
We introduce a novel sketch-to-image tool that aligns with the iterative refinement process of artists.
Our tool lets users sketch blocking strokes to coarsely represent the placement and form of objects and detail strokes to refine their shape and silhouettes.
We develop a two-pass algorithm for generating high-fidelity images from such sketches at any point in the iterative process.
arXiv Detail & Related papers (2024-02-28T07:09:31Z) - Let the Chart Spark: Embedding Semantic Context into Chart with
Text-to-Image Generative Model [7.587729429265939]
Pictorial visualization seamlessly integrates data and semantic context into visual representation.
We propose ChartSpark, a novel system that embeds semantic context into chart based on text-to-image generative model.
We develop an interactive visual interface that integrates a text analyzer, editing module, and evaluation module to enable users to generate, modify, and assess pictorial visualizations.
arXiv Detail & Related papers (2023-04-28T05:18:30Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - SGDraw: Scene Graph Drawing Interface Using Object-Oriented
Representation [18.109884282338356]
We propose SGDraw, a scene graph drawing interface using object-oriented scene graph representation.
We show that SGDraw can help generate scene graphs with richer details and describe the images more accurately than traditional bounding box annotations.
arXiv Detail & Related papers (2022-11-30T02:35:09Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Semantic Image Manipulation Using Scene Graphs [105.03614132953285]
We introduce a-semantic scene graph network that does not require direct supervision for constellation changes or image edits.
This makes possible to train the system from existing real-world datasets with no additional annotation effort.
arXiv Detail & Related papers (2020-04-07T20:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.