Automatic Synthesis of Diverse Weak Supervision Sources for Behavior
Analysis
- URL: http://arxiv.org/abs/2111.15186v1
- Date: Tue, 30 Nov 2021 07:51:12 GMT
- Title: Automatic Synthesis of Diverse Weak Supervision Sources for Behavior
Analysis
- Authors: Albert Tseng, Jennifer J. Sun, Yisong Yue
- Abstract summary: AutoSWAP is a framework for automatically synthesizing data-efficient task-level labeling functions.
We show that AutoSWAP is an effective way to automatically generate labeling functions that can significantly reduce expert effort for behavior analysis.
- Score: 37.077883083886114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obtaining annotations for large training sets is expensive, especially in
behavior analysis settings where domain knowledge is required for accurate
annotations. Weak supervision has been studied to reduce annotation costs by
using weak labels from task-level labeling functions to augment ground truth
labels. However, domain experts are still needed to hand-craft labeling
functions for every studied task. To reduce expert effort, we present AutoSWAP:
a framework for automatically synthesizing data-efficient task-level labeling
functions. The key to our approach is to efficiently represent expert knowledge
in a reusable domain specific language and domain-level labeling functions,
with which we use state-of-the-art program synthesis techniques and a small
labeled dataset to generate labeling functions. Additionally, we propose a
novel structural diversity cost that allows for direct synthesis of diverse
sets of labeling functions with minimal overhead, further improving labeling
function data efficiency. We evaluate AutoSWAP in three behavior analysis
domains and demonstrate that AutoSWAP outperforms existing approaches using
only a fraction of the data. Our results suggest that AutoSWAP is an effective
way to automatically generate labeling functions that can significantly reduce
expert effort for behavior analysis.
Related papers
- FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Towards Zero-Shot Frame Semantic Parsing with Task Agnostic Ontologies
and Simple Labels [0.9236074230806577]
OpenFSP is a framework for easy creation of new domains from simple labels.
Our approach relies on creating a small, but expressive, set of domain agnostic slot types.
Our model outperforms strong baselines in this simple labels setting.
arXiv Detail & Related papers (2023-05-05T18:47:18Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - AutoWS: Automated Weak Supervision Framework for Text Classification [1.748907524043535]
We propose a novel framework for increasing the efficiency of weak supervision process while decreasing the dependency on domain experts.
Our method requires a small set of labeled examples per label class and automatically creates a set of labeling functions to assign noisy labels to numerous unlabeled data.
arXiv Detail & Related papers (2023-02-07T07:12:05Z) - SepLL: Separating Latent Class Labels from Weak Supervision Noise [4.730767228515796]
In weakly supervised learning, labeling functions automatically assign, often noisy, labels to data samples.
In this work, we provide a method for learning from weak labels by separating two types of complementary information.
Our model is competitive with the state-of-the-art, and yields a new best average performance.
arXiv Detail & Related papers (2022-10-25T10:33:45Z) - LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds.
Our method co-designs an efficient labeling process with semi/weakly supervised learning.
Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z) - TagRuler: Interactive Tool for Span-Level Data Programming by
Demonstration [1.4050836886292872]
Data programming was only accessible to users who knew how to program.
We build a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming.
arXiv Detail & Related papers (2021-06-24T04:49:42Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Data Programming by Demonstration: A Framework for Interactively
Learning Labeling Functions [2.338938629983582]
We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users.
DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics.
We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
arXiv Detail & Related papers (2020-09-03T04:25:08Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.