T-METASET: Task-Aware Generation of Metamaterial Datasets by
Diversity-Based Active Learning
- URL: http://arxiv.org/abs/2202.10565v1
- Date: Mon, 21 Feb 2022 22:46:49 GMT
- Title: T-METASET: Task-Aware Generation of Metamaterial Datasets by
Diversity-Based Active Learning
- Authors: Doksoo Lee, Yu-Chin Chan, Wei (Wayne) Chen, Liwei Wang, Anton van
Beek, Wei Chen
- Abstract summary: We propose t-METASET: an intelligent data acquisition framework for task-aware dataset generation.
We validate the proposed framework in three hypothetical deployment scenarios, which encompass general use, task-aware use, and tailorable use.
- Score: 14.668178146934588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by the recent success of deep learning in diverse domains,
data-driven metamaterials design has emerged as a compelling design paradigm to
unlock the potential of multiscale architecture. However, existing
model-centric approaches lack principled methodologies dedicated to
high-quality data generation. Resorting to space-filling design in shape
descriptor space, existing metamaterial datasets suffer from property
distributions that are either highly imbalanced or at odds with design tasks of
interest. To this end, we propose t-METASET: an intelligent data acquisition
framework for task-aware dataset generation. We seek a solution to a
commonplace yet frequently overlooked scenario at early design stages: when a
massive ($~\sim O(10^4)$) shape library has been prepared with no properties
evaluated. The key idea is to exploit a data-driven shape descriptor learned
from generative models, fit a sparse regressor as the start-up agent, and
leverage diversity-related metrics to drive data acquisition to areas that help
designers fulfill design goals. We validate the proposed framework in three
hypothetical deployment scenarios, which encompass general use, task-aware use,
and tailorable use. Two large-scale shape-only mechanical metamaterial datasets
are used as test datasets. The results demonstrate that t-METASET can
incrementally grow task-aware datasets. Applicable to general design
representations, t-METASET can boost future advancements of not only
metamaterials but data-driven design in other domains.
Related papers
- Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator.
We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z) - Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization [0.0]
We introduce a new approach for representation learning on tabular data based on Tomoharu Iwata and Atsutoshi Kumagai.
We show that general representations may not suffice for some meta-tasks where requirements are not explicitly considered during extraction.
arXiv Detail & Related papers (2024-03-07T18:16:29Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Data-Driven Design for Metamaterials and Multiscale Systems: A Review [15.736695579155047]
Metamaterials are artificial materials designed to exhibit effective material parameters that go beyond those found in nature.
A compelling paradigm that could bring the full potential of metamaterials to fruition is emerging: data-driven design.
We organize existing research into data-driven modules, encompassing data acquisition, machine learning-based unit cell design, and data-driven multiscale optimization.
arXiv Detail & Related papers (2023-07-01T22:36:40Z) - A Systematic Survey in Geometric Deep Learning for Structure-based Drug
Design [63.30166298698985]
Structure-based drug design (SBDD) utilizes the three-dimensional geometry of proteins to identify potential drug candidates.
Recent developments in geometric deep learning, focusing on the integration and processing of 3D geometric data, have greatly advanced the field of structure-based drug design.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Improving the Performance of Fine-Grain Image Classifiers via Generative
Data Augmentation [0.5161531917413706]
We develop Data Augmentation from Proficient Pre-Training of Robust Generative Adrial Networks (DAPPER GAN)
DAPPER GAN is an ML analytics support tool that automatically generates novel views of training images.
We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy.
arXiv Detail & Related papers (2020-08-12T15:29:11Z) - METASET: Exploring Shape and Property Spaces for Data-Driven
Metamaterials Design [20.272835126269374]
We show that a smaller yet diverse set of unit cells leads to scalable search and unbiased learning.
Our flexible method can distill unique subsets regardless of the metric employed.
Our diverse subsets are provided publicly for use by any designer.
arXiv Detail & Related papers (2020-06-01T03:36:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.