Related papers: KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation

Related papers

SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation [70.27631454256024]
SkillVerse is an unsupervised tree-structured diagnosis framework for understanding model proficiency in specific abilities.<n>Given proficiency at arbitrary levels of granularity, SkillVerse is flexible to produce insights of behaviors of modern large models.
arXiv Detail & Related papers (2025-05-31T00:08:59Z)
CLaC at SemEval-2025 Task 6: A Multi-Architecture Approach for Corporate Environmental Promise Verification [0.20482269513546458]
This paper presents our approach to the SemEval-2025 Task6 (PromiseEval), which focuses on verifying promises in corporate ESG (Environmental, Social, and Governance) reports.<n>We explore three model architectures to address the four subtasks of promise identification, supporting evidence assessment, clarity evaluation, and verification timing.<n>Our work highlights the effectiveness of linguistic feature extraction, attention pooling, and multi-objective learning in promise verification tasks, despite challenges posed by class imbalance and limited training data.
arXiv Detail & Related papers (2025-05-29T15:19:00Z)
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging [43.45477240307602]
This paper presents the ZJUKLAB team's submission for SemEval-2025 Task 4: Unlearning Sensitive Content from Large Language Models. This task aims to selectively erase sensitive knowledge from large language models, avoiding both over-forgetting and under-forgetting issues. We propose an unlearning system that leverages Model Merging, combining two specialized models into a more balanced unlearned model.
arXiv Detail & Related papers (2025-03-27T02:03:25Z)
SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection [51.99159169107426]
We present our novel systems developed for the SemEval-2024 hallucination detection task. Our investigation spans a range of strategies to compare model predictions with reference standards. We introduce three distinct methods that exhibit strong performance metrics.
arXiv Detail & Related papers (2024-04-09T09:03:44Z)
InternLM2 Technical Report [159.70692271378581]
This paper introduces InternLM2, an open-source Large Language Models (LLMs) that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages.
arXiv Detail & Related papers (2024-03-26T00:53:24Z)
NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models [4.329463429688995]
This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023. For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models. We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models.
arXiv Detail & Related papers (2023-09-16T18:32:15Z)
Zero-shot Visual Question Answering with Language Model Feedback [83.65140324876536]
We propose a language model guided captioning approach, LAMOC, for knowledge-based visual question answering (VQA) Our approach employs the generated captions by a captioning model as the context of an answer prediction model, which is a Pre-trained Language model (PLM)
arXiv Detail & Related papers (2023-05-26T15:04:20Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results. In pre-training stage, we adopt mlm task, classification task and contrastive learning task. In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z)
Neural Coreference Resolution based on Reinforcement Learning [53.73316523766183]
Coreference resolution systems need to solve two subtasks. One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention. We propose a reinforcement learning actor-critic-based neural coreference resolution system.
arXiv Detail & Related papers (2022-12-18T07:36:35Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model. Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model. Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z)
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation [8.971288666318719]
This paper introduces our systems for all three subtasks of SemEval-2021 Task 4: Reading of Abstract Meaning. We well-design many simple and effective approaches adapted to the backbone model (RoBERTa) Experimental results show that our approaches achieve significant performance compared with the baseline systems.
arXiv Detail & Related papers (2021-02-25T10:51:48Z)
QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model [2.728575246952532]
In this paper, we present language model system submitted to SemEval-2020 Task 4 competition: "Commonsense Validation and Explanation" We implemented with transfer learning using pretrained language models (BERT, XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. The ensembled model better solves this problem, making the model's accuracy reached 95.9% on subtask A, which just worse than human's by only 3% accuracy.
arXiv Detail & Related papers (2020-09-06T05:12:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.