Classification Protocols with Minimal Disclosure
- URL: http://arxiv.org/abs/2209.02690v1
- Date: Tue, 6 Sep 2022 17:57:52 GMT
- Title: Classification Protocols with Minimal Disclosure
- Authors: Jinshuo Dong, Jason Hartline, Aravindan Vijayaraghavan
- Abstract summary: We consider multi-party protocols for classification motivated by applications such as e-discovery in court proceedings.
We identify a protocol that guarantees that the requesting party receives all responsive documents and the sending party discloses the minimal amount of non-responsive documents.
This protocol can be embedded in a machine learning framework that enables automated labeling of points.
- Score: 12.308957254601243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider multi-party protocols for classification that are motivated by
applications such as e-discovery in court proceedings. We identify a protocol
that guarantees that the requesting party receives all responsive documents and
the sending party discloses the minimal amount of non-responsive documents
necessary to prove that all responsive documents have been received. This
protocol can be embedded in a machine learning framework that enables automated
labeling of points and the resulting multi-party protocol is equivalent to the
standard one-party classification problem (if the one-party classification
problem satisfies a natural independence-of-irrelevant-alternatives property).
Our formal guarantees focus on the case where there is a linear classifier that
correctly partitions the documents.
Related papers
- DT-SIM: Property-Based Testing for MPC Security [2.0308771704846245]
Property-based testing is effective for detecting security bugs in secure protocols.
We specifically target Secure Multi-Party Computation (MPC)
We devise a test that can detect various flaws in a bit-level implementation of an MPC protocol.
arXiv Detail & Related papers (2024-03-08T02:02:24Z) - Error-Tolerant E-Discovery Protocols [18.694850127330973]
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022)
Based on a request for production from the requesting party, the responding party is required to provide documents that are responsive to the request except for those that are legally privileged.
Our goal is to find a protocol that verifies that the responding party sends almost all responsive documents while minimizing the disclosure of non-responsive documents.
arXiv Detail & Related papers (2024-01-31T15:59:16Z) - Beyond Document Page Classification: Design, Datasets, and Challenges [32.94494070330065]
This paper highlights the need to bring document classification benchmarking closer to real-world applications.
We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations.
arXiv Detail & Related papers (2023-08-24T16:16:47Z) - A Universal Unbiased Method for Classification from Aggregate
Observations [115.20235020903992]
This paper presents a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses.
Our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses.
arXiv Detail & Related papers (2023-06-20T07:22:01Z) - A Security Verification Framework of Cryptographic Protocols Using
Machine Learning [0.0]
We propose a security verification framework for cryptographic protocols using machine learning.
We create arbitrarily large datasets by automatically generating random protocols and assigning security labels to them.
We evaluate the proposed method by applying it to verification of practical cryptographic protocols.
arXiv Detail & Related papers (2023-04-26T02:37:43Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Specialized Document Embeddings for Aspect-based Similarity of Research
Papers [4.661692753666685]
We treat aspect-based similarity as a classical vector similarity problem in aspect-specific embedding spaces.
We represent a document not as a single generic embedding but as multiple specialized embeddings.
Our approach mitigates potential risks arising from implicit biases by making them explicit.
arXiv Detail & Related papers (2022-03-28T07:35:26Z) - Out-of-Category Document Identification Using Target-Category Names as
Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories.
We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.