The Visual Language of Fabrics
- URL: http://arxiv.org/abs/2307.13681v1
- Date: Tue, 25 Jul 2023 17:39:39 GMT
- Title: The Visual Language of Fabrics
- Authors: Valentin Deschaintre, Julia Guerrero-Viu, Diego Gutierrez, Tamy
Boubekeur, Belen Masia
- Abstract summary: We introduce text2fabric, a novel dataset that links free-text descriptions to various fabric materials.
The dataset comprises 15,000 natural language descriptions associated to 3,000 corresponding images of fabric materials.
- Score: 14.926030595313447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce text2fabric, a novel dataset that links free-text descriptions
to various fabric materials. The dataset comprises 15,000 natural language
descriptions associated to 3,000 corresponding images of fabric materials.
Traditionally, material descriptions come in the form of tags/keywords, which
limits their expressivity, induces pre-existing knowledge of the appropriate
vocabulary, and ultimately leads to a chopped description system. Therefore, we
study the use of free-text as a more appropriate way to describe material
appearance, taking the use case of fabrics as a common item that non-experts
may often deal with. Based on the analysis of the dataset, we identify a
compact lexicon, set of attributes and key structure that emerge from the
descriptions. This allows us to accurately understand how people describe
fabrics and draw directions for generalization to other types of materials. We
also show that our dataset enables specializing large vision-language models
such as CLIP, creating a meaningful latent space for fabric appearance, and
significantly improving applications such as fine-grained material retrieval
and automatic captioning.
Related papers
- MatText: Do Language Models Need More than Text & Scale for Materials Modeling? [5.561723952524538]
MatText is a suite of benchmarking tools and datasets designed to systematically evaluate the performance of language models in modeling materials.
MatText provides essential tools for training and benchmarking the performance of language models in the context of materials science.
arXiv Detail & Related papers (2024-06-25T05:45:07Z) - DOCCI: Descriptions of Connected and Contrasting Images [58.377060316967864]
Descriptions of Connected and Contrasting Images (DOCCI) is a dataset with long, human-annotated English descriptions for 15k images.
We instruct human annotators to create comprehensive descriptions for each image.
We show that DOCCI is a useful testbed for text-to-image generation.
arXiv Detail & Related papers (2024-04-30T17:56:24Z) - Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details [12.660352353074012]
We propose Text2Scene, a method to automatically create realistic textures for virtual scenes composed of multiple objects.
Our pipeline adds detailed texture on labeled 3D geometries in the room such that the generated colors respect the hierarchical structure or semantic parts that are often composed of similar materials.
arXiv Detail & Related papers (2023-08-31T17:37:23Z) - Leveraging Language Representation for Material Recommendation, Ranking,
and Exploration [0.0]
We introduce a material discovery framework that uses natural language embeddings derived from language models as representations of compositional and structural features.
By applying the framework to thermoelectrics, we demonstrate diversified recommendations of prototype structures and identify under-studied high-performance material spaces.
arXiv Detail & Related papers (2023-05-01T21:58:29Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Leveraging Textures in Zero-shot Understanding of Fine-Grained Domains [34.848408203825194]
We study the effectiveness of large-scale language and vision models (e.g., CLIP) at recognizing texture attributes in natural images.
We first conduct a systematic study of CLIP on texture datasets where we find that it has good coverage for a wide range of texture terms.
We then show how these attributes allow for zero-shot fine-grained categorization on existing datasets.
arXiv Detail & Related papers (2022-03-22T04:07:20Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z) - Describing Textures using Natural Language [32.076605062485605]
Textures in natural images can be characterized by color, shape, periodicity of elements within them, and other attributes that can be described using natural language.
We study the problem of describing visual attributes of texture on a novel dataset containing rich descriptions of textures.
We present visualizations of several fine-grained domains and show that texture attributes learned on our dataset offer improvements over expert-designed attributes on the Caltech-UCSD Birds dataset.
arXiv Detail & Related papers (2020-08-03T20:37:35Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - TextCaps: a Dataset for Image Captioning with Reading Comprehension [56.89608505010651]
Text is omnipresent in human environments and frequently critical to understand our surroundings.
To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images.
Our dataset challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase.
arXiv Detail & Related papers (2020-03-24T02:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.