Automatic Geo-alignment of Artwork in Children's Story Books
- URL: http://arxiv.org/abs/2304.01204v1
- Date: Thu, 16 Mar 2023 06:23:06 GMT
- Title: Automatic Geo-alignment of Artwork in Children's Story Books
- Authors: Jakub J. Dylag, Victor Suarez, James Wald, Aneesha Amodini Uvara
- Abstract summary: The project aligns with the company's vision by leveraging the generalisation and scalability of Machine Learning algorithms.
The presented approach can also be adapted to Video and 3D sculpture generation for novel illustrations in digital webbooks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A study was conducted to prove AI software could be used to translate and
generate illustrations without any human intervention. This was done with the
purpose of showing and distributing it to the external customer, Pratham Books.
The project aligns with the company's vision by leveraging the generalisation
and scalability of Machine Learning algorithms, offering significant cost
efficiency increases to a wide range of literary audiences in varied
geographical locations. A comparative study methodology was utilised to
determine the best performant method out of the 3 devised, Prompt Augmentation
using Keywords, CLIP Embedding Mask, and Cross Attention Control with Editorial
Prompts. A thorough evaluation process was completed using both quantitative
and qualitative measures. Each method had its own strengths and weaknesses, but
through the evaluation, method 1 was found to have the best yielding results.
Promising future advancements may be made to further increase image quality by
incorporating Large Language Models and personalised stylistic models. The
presented approach can also be adapted to Video and 3D sculpture generation for
novel illustrations in digital webbooks.
Related papers
- PaperBanana: Automating Academic Illustration for AI Scientists [58.120067704652314]
PaperBanana is an agentic framework for automated generation of publication-ready academic illustrations.<n>Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique.
arXiv Detail & Related papers (2026-01-30T18:33:37Z) - DINOv3 [62.31809406012177]
Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures.<n>This technical report introduces DINOv3, a major milestone toward realizing this vision by leveraging simple yet effective strategies.<n>DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks.
arXiv Detail & Related papers (2025-08-13T18:00:55Z) - LEGO: Self-Supervised Representation Learning for Scene Text Images [32.21085469233465]
We propose a Local Explicit and Global Order-aware self-supervised representation learning method for scene text images.
Inspired by the human cognitive process of learning words, we propose three novel pre-text tasks for LEGO to model sequential, semantic, and structural features.
The LEGO recognizer achieves superior or comparable performance compared to state-of-the-art scene text recognition methods on six benchmarks.
arXiv Detail & Related papers (2024-08-04T14:07:14Z) - Information Theoretic Text-to-Image Alignment [49.396917351264655]
We present a novel method that relies on an information-theoretic alignment measure to steer image generation.
Our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI.
arXiv Detail & Related papers (2024-05-31T12:20:02Z) - A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
Objects in 3D Scenes [80.20670062509723]
3D dense captioning is an emerging vision-language bridging task that aims to generate detailed descriptions for 3D scenes.
It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning.
Despite the popularity and success of existing methods, there is a lack of comprehensive surveys summarizing the advancements in this field.
arXiv Detail & Related papers (2024-03-12T10:04:08Z) - Masked Modeling for Self-supervised Representation Learning on Vision
and Beyond [69.64364187449773]
Masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training.
We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more.
We conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research.
arXiv Detail & Related papers (2023-12-31T12:03:21Z) - Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability [43.984177729641615]
This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models.
We propose several metrics and conduct extensive experiments to investigate their techniques.
The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification.
arXiv Detail & Related papers (2023-07-06T17:05:26Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Self-Supervised Image Representation Learning: Transcending Masking with
Paired Image Overlay [10.715255809531268]
This paper proposes a novel image augmentation technique, overlaying images, which has not been widely applied in self-supervised learning.
The proposed method is evaluated using contrastive learning, a widely used self-supervised learning method that has shown solid performance in downstream tasks.
arXiv Detail & Related papers (2023-01-23T07:00:04Z) - Automatic Image Content Extraction: Operationalizing Machine Learning in
Humanistic Photographic Studies of Large Visual Archives [81.88384269259706]
We introduce Automatic Image Content Extraction framework for machine learning-based search and analysis of large image archives.
The proposed framework can be applied in several domains in humanities and social sciences.
arXiv Detail & Related papers (2022-04-05T12:19:24Z) - Survey on Automated Short Answer Grading with Deep Learning: from Word
Embeddings to Transformers [5.968260239320591]
Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students.
Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG.
arXiv Detail & Related papers (2022-03-11T13:47:08Z) - From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence.
Research in image captioning has not reached a conclusive answer yet.
This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.