SlideImages: A Dataset for Educational Image Classification
- URL: http://arxiv.org/abs/2001.06823v1
- Date: Sun, 19 Jan 2020 13:11:55 GMT
- Title: SlideImages: A Dataset for Educational Image Classification
- Authors: David Morris, Eric M\"uller-Budack, Ralph Ewerth
- Abstract summary: We present SlideImages, a dataset for the task of classifying educational illustrations.
We have reserved all the actual educational images as a test dataset.
We present a baseline system using a standard deep neural architecture.
- Score: 8.607440622310904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past few years, convolutional neural networks (CNNs) have achieved
impressive results in computer vision tasks, which however mainly focus on
photos with natural scene content. Besides, non-sensor derived images such as
illustrations, data visualizations, figures, etc. are typically used to convey
complex information or to explore large datasets. However, this kind of images
has received little attention in computer vision. CNNs and similar techniques
use large volumes of training data. Currently, many document analysis systems
are trained in part on scene images due to the lack of large datasets of
educational image data. In this paper, we address this issue and present
SlideImages, a dataset for the task of classifying educational illustrations.
SlideImages contains training data collected from various sources, e.g.,
Wikimedia Commons and the AI2D dataset, and test data collected from
educational slides. We have reserved all the actual educational images as a
test dataset in order to ensure that the approaches using this dataset
generalize well to new educational images, and potentially other domains.
Furthermore, we present a baseline system using a standard deep neural
architecture and discuss dealing with the challenge of limited training data.
Related papers
- Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections [0.0]
This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario.
We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs)
The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.
arXiv Detail & Related papers (2024-10-25T09:56:15Z) - Deep Image Composition Meets Image Forgery [0.0]
Image forgery has been studied for many years.
Deep learning models require large amounts of labeled data for training.
We use state of the art image composition deep learning models to generate spliced images close to the quality of real-life manipulations.
arXiv Detail & Related papers (2024-04-03T17:54:37Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Diversify Your Vision Datasets with Automatic Diffusion-Based
Augmentation [66.6546668043249]
ALIA (Automated Language-guided Image Augmentation) is a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains.
To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information.
We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks.
arXiv Detail & Related papers (2023-05-25T17:43:05Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Image Data Augmentation for Deep Learning: A Survey [8.817690876855728]
We systematically review different image data augmentation methods.
We propose a taxonomy of reviewed methods and present the strengths and limitations of these methods.
We also conduct extensive experiments with various data augmentation methods on three typical computer vision tasks.
arXiv Detail & Related papers (2022-04-19T02:05:56Z) - Improving Fractal Pre-training [0.76146285961466]
We propose an improved pre-training dataset based on dynamically-generated fractal images.
Our experiments demonstrate that fine-tuning a network pre-trained using fractals attains 92.7-98.1% of the accuracy of an ImageNet pre-trained network.
arXiv Detail & Related papers (2021-10-06T22:39:51Z) - The Intrinsic Dimension of Images and Its Impact on Learning [60.811039723427676]
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations.
In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning.
arXiv Detail & Related papers (2021-04-18T16:29:23Z) - Applying convolutional neural networks to extremely sparse image
datasets using an image subdivision approach [0.0]
The aim of this work is to demonstrate that convolutional neural networks (CNN) can be applied to extremely sparse image libraries by subdivision of the original image datasets.
arXiv Detail & Related papers (2020-10-25T07:43:20Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.