Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from
Transformers by Self-supervised Learning of Sketch Gestalt
- URL: http://arxiv.org/abs/2005.09159v1
- Date: Tue, 19 May 2020 01:35:44 GMT
- Title: Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from
Transformers by Self-supervised Learning of Sketch Gestalt
- Authors: Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue
- Abstract summary: We present a model of learning Sketch BiBERT Representation from Transformer (Sketch-BERT)
We generalize BERT to sketch domain, with the novel proposed components and pre-training algorithms.
We show that the learned representation of Sketch-BERT can help and improve the performance of the downstream tasks of sketch recognition, sketch retrieval, and sketch gestalt.
- Score: 125.17887147597567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous researches of sketches often considered sketches in pixel format and
leveraged CNN based models in the sketch understanding. Fundamentally, a sketch
is stored as a sequence of data points, a vector format representation, rather
than the photo-realistic image of pixels. SketchRNN studied a generative neural
representation for sketches of vector format by Long Short Term Memory networks
(LSTM). Unfortunately, the representation learned by SketchRNN is primarily for
the generation tasks, rather than the other tasks of recognition and retrieval
of sketches. To this end and inspired by the recent BERT model, we present a
model of learning Sketch Bidirectional Encoder Representation from Transformer
(Sketch-BERT). We generalize BERT to sketch domain, with the novel proposed
components and pre-training algorithms, including the newly designed sketch
embedding networks, and the self-supervised learning of sketch gestalt.
Particularly, towards the pre-training task, we present a novel Sketch Gestalt
Model (SGM) to help train the Sketch-BERT. Experimentally, we show that the
learned representation of Sketch-BERT can help and improve the performance of
the downstream tasks of sketch recognition, sketch retrieval, and sketch
gestalt.
Related papers
- SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation [6.39528707908268]
There continues to be a lack of large-scale paired datasets for scene sketches.
We propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch.
We contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets.
arXiv Detail & Related papers (2024-05-29T06:43:49Z) - SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation [111.2195741547517]
We present a method to generate controlled sketches using a text-conditioned diffusion model trained on pixel representations of images.
Our objective is to empower non-professional users to create sketches and, through a series of optimisation processes, transform a narrative into a storyboard.
arXiv Detail & Related papers (2023-08-27T19:44:44Z) - SENS: Part-Aware Sketch-based Implicit Neural Shape Modeling [124.3266213819203]
We present SENS, a novel method for generating and editing 3D models from hand-drawn sketches.
S SENS analyzes the sketch and encodes its parts into ViT patch encoding.
S SENS supports refinement via part reconstruction, allowing for nuanced adjustments and artifact removal.
arXiv Detail & Related papers (2023-06-09T17:50:53Z) - Sketch Less Face Image Retrieval: A New Challenge [9.703239229149261]
Drawing a complete face sketch often needs skills and takes time, which hinders its widespread applicability in the practice.
In this study, we proposed sketch less face image retrieval (SLFIR), in which the retrieval was carried out at each stroke and aim to retrieve the target face photo using a partial sketch with as few strokes as possible.
Experiments indicate that the new framework can finish the retrieval using a partial or pool drawing sketch.
arXiv Detail & Related papers (2023-02-11T02:36:00Z) - I Know What You Draw: Learning Grasp Detection Conditioned on a Few
Freehand Sketches [74.63313641583602]
We propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects.
Our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications.
arXiv Detail & Related papers (2022-05-09T04:23:36Z) - FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in
Context [112.07988211268612]
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO.
Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals.
We study for the first time the problem of the fine-grained image retrieval from freehand scene sketches and sketch captions.
arXiv Detail & Related papers (2022-03-04T03:00:51Z) - SketchLattice: Latticed Representation for Sketch Manipulation [30.092468954557468]
Key challenge in designing a sketch representation lies with handling the abstract and iconic nature of sketches.
We propose a lattice structured sketch representation that not only removes the bottleneck of requiring vector data but also preserves the structural cues that vector data provides.
Our lattice representation could be effectively encoded using a graph model, that uses significantly fewer model parameters (13.5 times lesser) than existing state-of-the-art.
arXiv Detail & Related papers (2021-08-26T08:02:21Z) - B\'ezierSketch: A generative model for scalable vector sketches [132.5223191478268]
We present B'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution.
We first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B'ezier curve.
This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches.
arXiv Detail & Related papers (2020-07-04T21:30:52Z) - Sketchformer: Transformer-based Representation for Sketched Structure [12.448155157592895]
Sketchformer is a transformer-based representation for encoding free-hand sketches input in a vector form.
We report several variants exploring continuous and tokenized input representations, and contrast their performance.
Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks.
arXiv Detail & Related papers (2020-02-24T17:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.