RLAS-BIABC: A Reinforcement Learning-Based Answer Selection Using the
BERT Model Boosted by an Improved ABC Algorithm
- URL: http://arxiv.org/abs/2301.02807v1
- Date: Sat, 7 Jan 2023 08:33:05 GMT
- Title: RLAS-BIABC: A Reinforcement Learning-Based Answer Selection Using the
BERT Model Boosted by an Improved ABC Algorithm
- Authors: Hamid Gharagozlou, Javad Mohammadzadeh, Azam Bastanfard and Saeed
Shiry Ghidary
- Abstract summary: Answer selection (AS) is a critical subtask of the open-domain question answering (QA) problem.
The present paper proposes a method called RLAS-BIABC for AS, which is established on attention mechanism-based long short-term memory (LSTM) and the bidirectional encoder representations from transformers (BERT) word embedding.
- Score: 6.82469220191368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Answer selection (AS) is a critical subtask of the open-domain question
answering (QA) problem. The present paper proposes a method called RLAS-BIABC
for AS, which is established on attention mechanism-based long short-term
memory (LSTM) and the bidirectional encoder representations from transformers
(BERT) word embedding, enriched by an improved artificial bee colony (ABC)
algorithm for pretraining and a reinforcement learning-based algorithm for
training backpropagation (BP) algorithm. BERT can be comprised in downstream
work and fine-tuned as a united task-specific architecture, and the pretrained
BERT model can grab different linguistic effects. Existing algorithms typically
train the AS model with positive-negative pairs for a two-class classifier. A
positive pair contains a question and a genuine answer, while a negative one
includes a question and a fake answer. The output should be one for positive
and zero for negative pairs. Typically, negative pairs are more than positive,
leading to an imbalanced classification that drastically reduces system
performance. To deal with it, we define classification as a sequential
decision-making process in which the agent takes a sample at each step and
classifies it. For each classification operation, the agent receives a reward,
in which the prize of the majority class is less than the reward of the
minority class. Ultimately, the agent finds the optimal value for the policy
weights. We initialize the policy weights with the improved ABC algorithm. The
initial value technique can prevent problems such as getting stuck in the local
optimum. Although ABC serves well in most tasks, there is still a weakness in
the ABC algorithm that disregards the fitness of related pairs of individuals
in discovering a neighboring food source position.
Related papers
- Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret
Minimization [25.25031447644468]
We propose ABCs, a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL) and counterfactual regret minimization (CFR)
ABCs adaptively chooses what fraction of the environment to explore by measuring the stationarity of the environment's reward and transition dynamics.
In Markov decision processes, ABCs converges to the optimal policy with at most an O(A) factor slowdown compared to BQL, where A is the number of actions in the environment.
arXiv Detail & Related papers (2024-02-19T04:58:39Z) - HARRIS: Hybrid Ranking and Regression Forests for Algorithm Selection [75.84584400866254]
We propose a new algorithm selector leveraging special forests, combining the strengths of both approaches while alleviating their weaknesses.
HARRIS' decisions are based on a forest model, whose trees are created based on optimized on a hybrid ranking and regression loss function.
arXiv Detail & Related papers (2022-10-31T14:06:11Z) - Provable Benefits of Actor-Critic Methods for Offline Reinforcement
Learning [85.50033812217254]
Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.
We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle.
arXiv Detail & Related papers (2021-08-19T17:27:29Z) - SetConv: A New Approach for Learning from Imbalanced Data [29.366843553056594]
We propose a set convolution operation and an episodic training strategy to extract a single representative for each class.
We prove that our proposed algorithm is permutation-invariant despite the order of inputs.
arXiv Detail & Related papers (2021-04-03T22:33:30Z) - Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed
Self-Training [38.81973113564937]
Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training.
In this paper, we reinterpret this label assignment problem as an optimal transportation problem between examples and classes.
We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm.
arXiv Detail & Related papers (2021-02-17T08:23:15Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - Iterative Weak Learnability and Multi-Class AdaBoost [0.0]
We construct an efficient ensemble algorithm for the multi-class classification problem inspired by SAMME.
In contrast to SAMME, our algorithm's final hypothesis converges to the correct label with probability 1.
The sum of the training error and an additional term, that depends only on the sample size, bounds the generalization error of our algorithm as the Adaptive Boosting algorithm.
arXiv Detail & Related papers (2021-01-26T03:30:30Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.