T-METASET: Task-Aware Generation of Metamaterial Datasets by
Diversity-Based Active Learning
- URL: http://arxiv.org/abs/2202.10565v1
- Date: Mon, 21 Feb 2022 22:46:49 GMT
- Title: T-METASET: Task-Aware Generation of Metamaterial Datasets by
Diversity-Based Active Learning
- Authors: Doksoo Lee, Yu-Chin Chan, Wei (Wayne) Chen, Liwei Wang, Anton van
Beek, Wei Chen
- Abstract summary: We propose t-METASET: an intelligent data acquisition framework for task-aware dataset generation.
We validate the proposed framework in three hypothetical deployment scenarios, which encompass general use, task-aware use, and tailorable use.
- Score: 14.668178146934588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by the recent success of deep learning in diverse domains,
data-driven metamaterials design has emerged as a compelling design paradigm to
unlock the potential of multiscale architecture. However, existing
model-centric approaches lack principled methodologies dedicated to
high-quality data generation. Resorting to space-filling design in shape
descriptor space, existing metamaterial datasets suffer from property
distributions that are either highly imbalanced or at odds with design tasks of
interest. To this end, we propose t-METASET: an intelligent data acquisition
framework for task-aware dataset generation. We seek a solution to a
commonplace yet frequently overlooked scenario at early design stages: when a
massive ($~\sim O(10^4)$) shape library has been prepared with no properties
evaluated. The key idea is to exploit a data-driven shape descriptor learned
from generative models, fit a sparse regressor as the start-up agent, and
leverage diversity-related metrics to drive data acquisition to areas that help
designers fulfill design goals. We validate the proposed framework in three
hypothetical deployment scenarios, which encompass general use, task-aware use,
and tailorable use. Two large-scale shape-only mechanical metamaterial datasets
are used as test datasets. The results demonstrate that t-METASET can
incrementally grow task-aware datasets. Applicable to general design
representations, t-METASET can boost future advancements of not only
metamaterials but data-driven design in other domains.
Related papers
- Metadata-based Data Exploration with Retrieval-Augmented Generation for Large Language Models [3.7685718201378746]
This research introduces a new architecture for data exploration which employs a form of Retrieval-Augmented Generation (RAG) to enhance metadata-based data discovery.
The proposed framework offers a new method for evaluating semantic similarity among heterogeneous data sources.
arXiv Detail & Related papers (2024-10-05T17:11:37Z) - Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator.
We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z) - Rethinking of Encoder-based Warm-start Methods in Hyperparameter Optimization [0.0]
We introduce a new approach for representation learning on tabular data based on Tomoharu Iwata and Atsutoshi Kumagai.
We show that general representations may not suffice for some meta-tasks where requirements are not explicitly considered during extraction.
arXiv Detail & Related papers (2024-03-07T18:16:29Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Data-Driven Design for Metamaterials and Multiscale Systems: A Review [15.736695579155047]
Metamaterials are artificial materials designed to exhibit effective material parameters that go beyond those found in nature.
A compelling paradigm that could bring the full potential of metamaterials to fruition is emerging: data-driven design.
We organize existing research into data-driven modules, encompassing data acquisition, machine learning-based unit cell design, and data-driven multiscale optimization.
arXiv Detail & Related papers (2023-07-01T22:36:40Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - METASET: Exploring Shape and Property Spaces for Data-Driven
Metamaterials Design [20.272835126269374]
We show that a smaller yet diverse set of unit cells leads to scalable search and unbiased learning.
Our flexible method can distill unique subsets regardless of the metric employed.
Our diverse subsets are provided publicly for use by any designer.
arXiv Detail & Related papers (2020-06-01T03:36:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.