Related papers: I Know Therefore I Score: Label-Free Crafting of Scoring Functions using Constraints Based on Domain Expertise

I Know Therefore I Score: Label-Free Crafting of Scoring Functions using Constraints Based on Domain Expertise

URL: http://arxiv.org/abs/2203.10085v1
Date: Fri, 18 Mar 2022 17:51:20 GMT
Title: I Know Therefore I Score: Label-Free Crafting of Scoring Functions using Constraints Based on Domain Expertise
Authors: Ragja Palakkadavath, Sarath Sivaprasad, Shirish Karande, Niranjan Pedanekar
Abstract summary: We introduce a label-free practical approach to learn a scoring function from multi-dimensional numerical data. The approach incorporates insights and business rules from domain experts in the form of easily observable and specifiable constraints. We convert such constraints into loss functions that are optimized simultaneously while learning the scoring function.
Score: 6.26476800426345
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Several real-life applications require crafting concise, quantitative scoring functions (also called rating systems) from measured observations. For example, an effectiveness score needs to be created for advertising campaigns using a number of engagement metrics. Experts often need to create such scoring functions in the absence of labelled data, where the scores need to reflect business insights and rules as understood by the domain experts. Without a way to capture these inputs systematically, this becomes a time-consuming process involving trial and error. In this paper, we introduce a label-free practical approach to learn a scoring function from multi-dimensional numerical data. The approach incorporates insights and business rules from domain experts in the form of easily observable and specifiable constraints, which are used as weak supervision by a machine learning model. We convert such constraints into loss functions that are optimized simultaneously while learning the scoring function. We examine the efficacy of the approach using a synthetic dataset as well as four real-life datasets, and also compare how it performs vis-a-vis supervised learning models.

Related papers

DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes [52.09869569068291]
We introduce DISCOVER, a method to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation. We demonstrate its effectiveness through a re-annotation exercise on widely used HAR datasets.
arXiv Detail & Related papers (2025-02-11T20:02:24Z)
Multi-Label Contrastive Learning : A Comprehensive Study [48.81069245141415]
Multi-label classification has emerged as a key area in both research and industry. Applying contrastive learning to multi-label classification presents unique challenges. We conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings.
arXiv Detail & Related papers (2024-11-27T20:20:06Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool. We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z)
Label-Efficient Interactive Time-Series Anomaly Detection [17.799924009674694]
We propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system. To achieve this goal, the system integrates weak supervision and active learning collaboratively. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions.
arXiv Detail & Related papers (2022-12-30T10:16:15Z)
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z)
Firenze: Model Evaluation Using Weak Signals [5.723905680436377]
We introduce Firenze, a novel framework for comparative evaluation of machine learning models' performance. We show that markers computed and combined over select subsets of samples called regions of interest can provide a robust estimate of their real-world performances.
arXiv Detail & Related papers (2022-07-02T13:20:38Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications. We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN) We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z)
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling [19.24454872492008]
Weak supervision offers a promising alternative for producing labeled datasets without ground truth labels. We develop the first framework for interactive weak supervision in which a method proposes iterations and learns from user feedback. Our experiments demonstrate that only a small number of feedback are needed to train models that achieve highly competitive test set performance.
arXiv Detail & Related papers (2020-12-11T00:10:38Z)
Task Programming: Learning Data Efficient Behavior Representations [44.244695150594815]
We present TREBA: a method to learn annotation-sample efficient trajectory embedding for behavior analysis. The tasks in our method can be efficiently engineered by domain experts through a process we call "task programming" We present experimental results in three datasets across two domains: mice and fruit flies.
arXiv Detail & Related papers (2020-11-27T18:58:32Z)
Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions [2.338938629983582]
We propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples.
arXiv Detail & Related papers (2020-09-03T04:25:08Z)
Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples. Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.