HaSPeR: An Image Repository for Hand Shadow Puppet Recognition
- URL: http://arxiv.org/abs/2408.10360v1
- Date: Mon, 19 Aug 2024 18:56:24 GMT
- Title: HaSPeR: An Image Repository for Hand Shadow Puppet Recognition
- Authors: Syed Rifat Raiyan, Zibran Zarif Amio, Sabbir Ahmed,
- Abstract summary: Hand shadow puppetry, also known as shadowgraphy or ombromanie, is a form of theatrical art and storytelling.
We present a novel dataset consisting of 8,340 images of hand shadow puppets across 11 classes extracted from both professional and amateur hand shadow puppeteer clips.
Our findings show a substantial performance superiority of traditional convolutional models over attention-based transformer architectures.
- Score: 2.048226951354646
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hand shadow puppetry, also known as shadowgraphy or ombromanie, is a form of theatrical art and storytelling where hand shadows are projected onto flat surfaces to create illusions of living creatures. The skilled performers create these silhouettes by hand positioning, finger movements, and dexterous gestures to resemble shadows of animals and objects. Due to the lack of practitioners and a seismic shift in people's entertainment standards, this art form is on the verge of extinction. To facilitate its preservation and proliferate it to a wider audience, we introduce ${\rm H{\small A}SP{\small E}R}$, a novel dataset consisting of 8,340 images of hand shadow puppets across 11 classes extracted from both professional and amateur hand shadow puppeteer clips. We provide a detailed statistical analysis of the dataset and employ a range of pretrained image classification models to establish baselines. Our findings show a substantial performance superiority of traditional convolutional models over attention-based transformer architectures. We also find that lightweight models, such as MobileNetV2, suited for mobile applications and embedded devices, perform comparatively well. We surmise that such low-latency architectures can be useful in developing ombromanie teaching tools, and we create a prototype application to explore this surmission. Keeping the best-performing model InceptionV3 under the limelight, we conduct comprehensive feature-spatial, explainability, and error analyses to gain insights into its decision-making process. To the best of our knowledge, this is the first documented dataset and research endeavor to preserve this dying art for future generations, with computer vision approaches. Our code and data are publicly available.
Related papers
- FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - Learning Physical-Spatio-Temporal Features for Video Shadow Removal [42.95422940263425]
We propose the first data-driven video shadow removal model, termedNet, by exploiting three essential characteristics of video shadows.
Specifically, dedicated physical branch was established to conduct local illumination estimation, which is more applicable for scenes with complex lighting textures.
To tackle the lack of datasets paired of shadow videos, we synthesize a dataset with aid of the popular game GTAV by controlling the switch of the shadow.
arXiv Detail & Related papers (2023-03-16T14:55:31Z) - Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model.
Our method does not require to train a dedicated model or a specialized encoder for the task.
We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z) - MagicPony: Learning Articulated 3D Animals in the Wild [81.63322697335228]
We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category.
At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes.
arXiv Detail & Related papers (2022-11-22T18:59:31Z) - ArcAid: Analysis of Archaeological Artifacts using Drawings [23.906975910478142]
Archaeology is an intriguing domain for computer vision.
It suffers not only from shortage in (labeled) data, but also from highly-challenging data, which is often extremely abraded and damaged.
This paper proposes a novel semi-supervised model for classification and retrieval of images of archaeological artifacts.
arXiv Detail & Related papers (2022-11-17T11:57:01Z) - TAVA: Template-free Animatable Volumetric Actors [29.93065805208324]
We propose TAVA, a method to create T emplate-free Animatable Volumetric Actors, based on neural representations.
Since TAVA does not require a body template, it is applicable to humans as well as other creatures such as animals.
arXiv Detail & Related papers (2022-06-17T17:59:59Z) - I Know What You Draw: Learning Grasp Detection Conditioned on a Few
Freehand Sketches [74.63313641583602]
We propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects.
Our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications.
arXiv Detail & Related papers (2022-05-09T04:23:36Z) - Pose-Guided High-Resolution Appearance Transfer via Progressive Training [65.92031716146865]
We propose a pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution.
Our network utilizes dense local descriptors including local perceptual loss and local discriminators to refine details.
Our model produces high-quality images, which can be further utilized in useful applications such as garment transfer between people.
arXiv Detail & Related papers (2020-08-27T03:18:44Z) - Deformation-aware Unpaired Image Translation for Pose Estimation on
Laboratory Animals [56.65062746564091]
We aim to capture the pose of neuroscience model organisms, without using any manual supervision, to study how neural circuits orchestrate behaviour.
Our key contribution is the explicit and independent modeling of appearance, shape and poses in an unpaired image translation framework.
We demonstrate improved pose estimation accuracy on Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Danio rerio (zebrafish)
arXiv Detail & Related papers (2020-01-23T15:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.