Detecting Privileged Documents by Ranking Connected Network Entities
- URL: http://arxiv.org/abs/2512.08073v1
- Date: Mon, 08 Dec 2025 22:16:54 GMT
- Title: Detecting Privileged Documents by Ranking Connected Network Entities
- Authors: Jianping Zhang, Han Qin, Nathaniel Huber-Fliflet,
- Abstract summary: This paper presents a link analysis approach for identifying privileged documents by constructing a network of human entities derived from email header metadata.<n>The core assumption is that individuals with frequent interactions with lawyers are more likely to participate in privileged communications.<n> Experimental results demonstrate the algorithm's effectiveness in ranking legal entities for privileged document detection.
- Score: 6.208621325426645
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents a link analysis approach for identifying privileged documents by constructing a network of human entities derived from email header metadata. Entities are classified as either counsel or non-counsel based on a predefined list of known legal professionals. The core assumption is that individuals with frequent interactions with lawyers are more likely to participate in privileged communications. To quantify this likelihood, an algorithm assigns a score to each entity within the network. By utilizing both entity scores and the strength of their connections, the method enhances the identification of privileged documents. Experimental results demonstrate the algorithm's effectiveness in ranking legal entities for privileged document detection.
Related papers
- Knowledge Augmented Entity and Relation Extraction for Legal Documents with Hypergraph Neural Network [1.446271016723962]
We propose an entity and relation extraction algorithm based on hypergraph neural network (Legal-KAHRE) for drug-related judgment documents.<n>We construct a legal dictionary with judicial domain knowledge and integrate it into text encoding representation.<n> Experimental results on the CAIL2022 information extraction dataset demonstrate that our method significantly outperforms existing baseline models.
arXiv Detail & Related papers (2026-02-09T05:46:11Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - Knowledge-Driven Cross-Document Relation Extraction [3.868708275322908]
Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task.
We propose a novel approach, KXDocRE, that embed domain knowledge of entities with input text for cross-document RE.
arXiv Detail & Related papers (2024-05-22T11:30:59Z) - On the Detection of Reviewer-Author Collusion Rings From Paper Bidding [71.43634536456844]
Collusion rings pose a major threat to the peer-review systems of computer science conferences.
One approach to solve this problem would be to detect the colluding reviewers from their manipulated bids.
No research has yet established that detecting collusion rings is even possible.
arXiv Detail & Related papers (2024-02-12T18:12:09Z) - DREQ: Document Re-Ranking Using Entity-based Query Understanding [6.675805308519988]
DREQ is an entity-oriented dense document re-ranking model.
We emphasize the query-relevant entities within a document's representation while simultaneously attenuating the less relevant ones.
We show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods.
arXiv Detail & Related papers (2024-01-11T14:27:12Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - Identity Documents Authentication based on Forgery Detection of
Guilloche Pattern [2.606834301724095]
An authentication model for identity documents based on forgery detection of guilloche patterns is proposed.
Experiments are conducted in order to analyze and identify the most proper parameters to achieve higher authentication performance.
arXiv Detail & Related papers (2022-06-22T11:37:10Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for
Computing Legal Case Document Similarity [9.007583099505954]
All prior network-based similarity methods considered a precedent citation network among case documents only (PCNet)
We propose to augment the PCNet with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet.
Experiments over a set of Indian Supreme Court case documents show that our proposed heterogeneous network enables significantly better document similarity estimation.
arXiv Detail & Related papers (2020-07-07T06:30:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.