An Interpretable Deep Learning System for Automatically Scoring Request
for Proposals
- URL: http://arxiv.org/abs/2008.02347v1
- Date: Wed, 5 Aug 2020 20:21:35 GMT
- Title: An Interpretable Deep Learning System for Automatically Scoring Request
for Proposals
- Authors: Subhadip Maji, Anudeep Srivatsav Appe, Raghav Bali, Veera Raghavendra
Chikka, Arijit Ghosh Chowdhury and Vamsi M Bhandaru
- Abstract summary: We propose a novel Bi-LSTM based regression model, and provide deeper insight into phrases which latently impact scoring of responses.
We also qualitatively asses the impact of important phrases using human evaluators.
Finally, we introduce a novel problem statement that can be used to further improve the state of the art in NLP based automatic scoring systems.
- Score: 3.244940746423378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Managed Care system within Medicaid (US Healthcare) uses Request For
Proposals (RFP) to award contracts for various healthcare and related services.
RFP responses are very detailed documents (hundreds of pages) submitted by
competing organisations to win contracts. Subject matter expertise and domain
knowledge play an important role in preparing RFP responses along with analysis
of historical submissions. Automated analysis of these responses through
Natural Language Processing (NLP) systems can reduce time and effort needed to
explore historical responses, and assisting in writing better responses. Our
work draws parallels between scoring RFPs and essay scoring models, while
highlighting new challenges and the need for interpretability. Typical scoring
models focus on word level impacts to grade essays and other short write-ups.
We propose a novel Bi-LSTM based regression model, and provide deeper insight
into phrases which latently impact scoring of responses. We contend the merits
of our proposed methodology using extensive quantitative experiments. We also
qualitatively asses the impact of important phrases using human evaluators.
Finally, we introduce a novel problem statement that can be used to further
improve the state of the art in NLP based automatic scoring systems.
Related papers
- Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs [0.41436032949434404]
We develop and rigorously evaluate new detection methods for issue framing and narrative analysis within large text datasets.
We show that issue framing can be reliably and efficiently detected in large corpora with only a few examples of either perspective on a given issue.
arXiv Detail & Related papers (2024-08-19T07:14:15Z) - RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem.
We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt.
We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z) - Transformer-based Joint Modelling for Automatic Essay Scoring and Off-Topic Detection [3.609048819576875]
We are proposing an unsupervised technique that jointly scores essays and detects off-topic essays.
Our proposed method outperforms the baseline we created and earlier conventional methods on two essay-scoring datasets.
arXiv Detail & Related papers (2024-03-24T21:44:14Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Towards LLM-based Autograding for Short Textual Answers [4.853810201626855]
This manuscript is an evaluation of a large language model for the purpose of autograding.
Our findings suggest that while "out-of-the-box" LLMs provide a valuable tool, their readiness for independent automated grading remains a work in progress.
arXiv Detail & Related papers (2023-09-09T22:25:56Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.