Detecting Relevant Information in High-Volume Chat Logs: Keyphrase
Extraction for Grooming and Drug Dealing Forensic Analysis
- URL: http://arxiv.org/abs/2311.04905v1
- Date: Fri, 15 Sep 2023 03:18:31 GMT
- Title: Detecting Relevant Information in High-Volume Chat Logs: Keyphrase
Extraction for Grooming and Drug Dealing Forensic Analysis
- Authors: Jeovane Hon\'orio Alves, Hor\'acio A. C. G. Pedroso, Rafael Honorio
Venetikides, Joel E. M. K\"oster, Luiz Rodrigo Grochocki, Cinthia O. A.
Freitas, Jean Paul Barddal
- Abstract summary: This paper presents a supervised keyphrase extraction approach to detect relevant information in high-volume chat logs involving grooming and drug dealing.
The proposed method, JointKPE++, builds upon the JointKPE keyphrase extractor by employing improvements to handle longer texts effectively.
- Score: 2.1638802483603987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing use of digital communication platforms has given rise to various
criminal activities, such as grooming and drug dealing, which pose significant
challenges to law enforcement and forensic experts. This paper presents a
supervised keyphrase extraction approach to detect relevant information in
high-volume chat logs involving grooming and drug dealing for forensic
analysis. The proposed method, JointKPE++, builds upon the JointKPE keyphrase
extractor by employing improvements to handle longer texts effectively. We
evaluate JointKPE++ using BERT-based pre-trained models on grooming and drug
dealing datasets, including BERT, RoBERTa, SpanBERT, and BERTimbau. The results
show significant improvements over traditional approaches and demonstrate the
potential for JointKPE++ to aid forensic experts in efficiently detecting
keyphrases related to criminal activities.
Related papers
- Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction [9.307602861891926]
Keyphrase extraction is an important task in Natural Language Processing.
In this study, we propose Diff-KPE to guide the text diffusion process for generating enhanced keyphrase representations.
Experiments show that Diff-KPE outperforms existing KPE methods on a large open domain keyphrase extraction benchmark, OpenKP, and a scientific domain dataset, KP20K.
arXiv Detail & Related papers (2023-08-17T02:26:30Z) - Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug
Trafficking Detection on Social Media [30.791563171321062]
We propose an analytical framework to compose emphknowledge-informed prompts, which serve as the interface that humans can interact with and use LLMs to perform the detection task.
Our experimental findings demonstrate that the proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy.
The implications of our research extend to social networks, emphasizing the importance of incorporating prior knowledge and scenario-based prompts into analytical tools to improve online security and public safety.
arXiv Detail & Related papers (2023-07-07T16:15:59Z) - HOICLIP: Efficient Knowledge Transfer for HOI Detection with
Vision-Language Models [30.279621764192843]
Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions.
Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors.
We propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization.
arXiv Detail & Related papers (2023-03-28T07:54:54Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Keyword Extraction for Improved Document Retrieval in Conversational
Search [10.798537120200006]
Mixed-initiative conversational search provides enormous advantages.
incorporating additional information provided by the user from the conversation poses some challenges.
We have collected two conversational keyword extraction datasets and propose an end-to-end document retrieval pipeline incorporating them.
arXiv Detail & Related papers (2021-09-13T13:55:37Z) - Retrieval-Free Knowledge-Grounded Dialogue Response Generation with
Adapters [52.725200145600624]
We propose KnowExpert to bypass the retrieval process by injecting prior knowledge into the pre-trained language models with lightweight adapters.
Experimental results show that KnowExpert performs comparably with the retrieval-based baselines.
arXiv Detail & Related papers (2021-05-13T12:33:23Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - Capturing Global Informativeness in Open Domain Keyphrase Extraction [40.57116173502994]
Open-domain KeyPhrase Extraction (KPE) aims to extract keyphrases from documents without domain or quality restrictions.
This paper presents JointKPE, an open-domain KPE architecture built on pre-trained language models.
JointKPE learns to rank keyphrases by estimating their informativeness in the entire document and is jointly trained on the keyphrase chunking task.
arXiv Detail & Related papers (2020-04-28T16:34:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.