Catch Me If You Can: Identifying Fraudulent Physician Reviews with Large
Language Models Using Generative Pre-Trained Transformers
- URL: http://arxiv.org/abs/2304.09948v1
- Date: Wed, 19 Apr 2023 19:59:26 GMT
- Title: Catch Me If You Can: Identifying Fraudulent Physician Reviews with Large
Language Models Using Generative Pre-Trained Transformers
- Authors: Aishwarya Deep Shukla, Laksh Agarwal, Jie Mein (JM) Goh, Guodong
(Gordon) Gao, Ritu Agarwal
- Abstract summary: The proliferation of fake reviews of doctors has potentially detrimental consequences for patient well-being.
This study utilizes a novel pre-labeled dataset of 38048 physician reviews to establish the effectiveness of large language models in classifying reviews.
- Score: 1.0499611180329804
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The proliferation of fake reviews of doctors has potentially detrimental
consequences for patient well-being and has prompted concern among consumer
protection groups and regulatory bodies. Yet despite significant advancements
in the fields of machine learning and natural language processing, there
remains limited comprehension of the characteristics differentiating fraudulent
from authentic reviews. This study utilizes a novel pre-labeled dataset of
38048 physician reviews to establish the effectiveness of large language models
in classifying reviews. Specifically, we compare the performance of traditional
ML models, such as logistic regression and support vector machines, to
generative pre-trained transformer models. Furthermore, we use GPT4, the newest
model in the GPT family, to uncover the key dimensions along which fake and
genuine physician reviews differ. Our findings reveal significantly superior
performance of GPT-3 over traditional ML models in this context. Additionally,
our analysis suggests that GPT3 requires a smaller training sample than
traditional models, suggesting its appropriateness for tasks with scarce
training data. Moreover, the superiority of GPT3 performance increases in the
cold start context i.e., when there are no prior reviews of a doctor. Finally,
we employ GPT4 to reveal the crucial dimensions that distinguish fake physician
reviews. In sharp contrast to previous findings in the literature that were
obtained using simulated data, our findings from a real-world dataset show that
fake reviews are generally more clinically detailed, more reserved in
sentiment, and have better structure and grammar than authentic ones.
Related papers
- High-Throughput Phenotyping of Clinical Text Using Large Language Models [0.0]
GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs.
GPT-4 results in high performance and generalizability across several phenotyping tasks.
arXiv Detail & Related papers (2024-08-02T12:00:00Z) - Optimal path for Biomedical Text Summarization Using Pointer GPT [21.919661430250798]
GPT models have a tendency to generate factual errors, lack context, and oversimplify words.
To address these limitations, we replaced the attention mechanism in the GPT model with a pointer network.
The effectiveness of the Pointer-GPT model was evaluated using the ROUGE score.
arXiv Detail & Related papers (2024-03-22T02:13:23Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - HuatuoGPT, towards Taming Language Model to Be a Doctor [67.96794664218318]
HuatuoGPT is a large language model (LLM) for medical consultation.
We leverage both textitdistilled data from ChatGPT and textitreal-world data from doctors in the supervised fine-tuned stage.
arXiv Detail & Related papers (2023-05-24T11:56:01Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Do We Still Need Clinical Language Models? [15.023633270864675]
We show that relatively small specialized clinical models substantially outperform all in-context learning approaches.
We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
arXiv Detail & Related papers (2023-02-16T05:08:34Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets.
We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z) - Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again [24.150464908060112]
We present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i.e., BERT-sized) PLMs.
Our results show that GPT-3 still significantly underperforms compared with simply fine-tuning a smaller PLM using the same small training set.
arXiv Detail & Related papers (2022-03-16T05:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.