Describing image focused in cognitive and visual details for visually
impaired people: An approach to generating inclusive paragraphs
- URL: http://arxiv.org/abs/2202.05331v1
- Date: Thu, 10 Feb 2022 21:20:53 GMT
- Title: Describing image focused in cognitive and visual details for visually
impaired people: An approach to generating inclusive paragraphs
- Authors: Daniel Louzada Fernandes, Marcos Henrique Fonseca Ribeiro, Fabio
Ribeiro Cerqueira, Michel Melo Silva
- Abstract summary: There is a lack of services that support specific tasks, such as understanding the image context presented in online content, e.g., webinars.
We propose an approach for generating context of webinar images combining a dense captioning technique with a set of filters, to fit the captions in our domain, and a language model for the abstractive summary task.
- Score: 2.362412515574206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several services for people with visual disabilities have emerged recently
due to achievements in Assistive Technologies and Artificial Intelligence
areas. Despite the growth in assistive systems availability, there is a lack of
services that support specific tasks, such as understanding the image context
presented in online content, e.g., webinars. Image captioning techniques and
their variants are limited as Assistive Technologies as they do not match the
needs of visually impaired people when generating specific descriptions. We
propose an approach for generating context of webinar images combining a dense
captioning technique with a set of filters, to fit the captions in our domain,
and a language model for the abstractive summary task. The results demonstrated
that we can produce descriptions with higher interpretability and focused on
the relevant information for that group of people by combining image analysis
methods and neural language models.
Related papers
- Pixels to Prose: Understanding the art of Image Captioning [1.9635669040319872]
Image captioning enables machines to interpret visual content and generate descriptive text.
The review traces the evolution of image captioning models to the latest cutting-edge solutions.
The paper also delves into the application of image captioning in the medical domain.
arXiv Detail & Related papers (2024-08-28T11:21:23Z) - Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review [0.0]
This paper explores AI-assistive deep learning image annotation systems that provide textual suggestions, captions, or descriptions of the input image to the annotator.
We review various datasets and how they contribute to the training and evaluation of AI-assistive annotation systems.
Despite the promising potential, there is limited publicly available work on AI-assistive image annotation with textual output capabilities.
arXiv Detail & Related papers (2024-06-28T22:56:17Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Deep Learning Approaches on Image Captioning: A Review [0.5852077003870417]
Image captioning aims to generate natural language descriptions for visual content in the form of still images.
Deep learning and vision-language pre-training techniques have revolutionized the field, leading to more sophisticated methods and improved performance.
We address the challenges faced in this field by emphasizing issues such as object hallucination, missing context, illumination conditions, contextual understanding, and referring expressions.
We identify several potential future directions for research in this area, which include tackling the information misalignment problem between image and text modalities, mitigating dataset bias, incorporating vision-language pre-training methods to enhance caption generation, and developing improved evaluation tools to accurately
arXiv Detail & Related papers (2022-01-31T00:39:37Z) - Neural Twins Talk & Alternative Calculations [3.198144010381572]
Inspired by how the human brain employs a higher number of neural pathways when describing a highly focused subject, we show that deep attentive models could be extended to achieve better performance.
Image captioning bridges a gap between computer vision and natural language processing.
arXiv Detail & Related papers (2021-08-05T18:41:34Z) - From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence.
Research in image captioning has not reached a conclusive answer yet.
This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - TextCaps: a Dataset for Image Captioning with Reading Comprehension [56.89608505010651]
Text is omnipresent in human environments and frequently critical to understand our surroundings.
To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images.
Our dataset challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase.
arXiv Detail & Related papers (2020-03-24T02:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.