CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought
- URL: http://arxiv.org/abs/2309.11143v4
- Date: Thu, 20 Jun 2024 12:34:48 GMT
- Title: CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought
- Authors: Bowen Zhang, Kehua Chang, Chunping Li,
- Abstract summary: This paper presents CoT-BERT, an innovative method that harnesses the progressive thinking of Chain-of-supervised reasoning.
We develop an advanced contrastive learning loss function and propose a novel template denoising strategy.
- Score: 3.0566617373924325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent strides within this domain have been significantly propelled by breakthroughs in contrastive learning and prompt engineering. Despite these advancements, the field has reached a plateau, leading some researchers to incorporate external components to enhance the quality of sentence embeddings. Such integration, though beneficial, complicates solutions and inflates demands for computational resources. In response to these challenges, this paper presents CoT-BERT, an innovative method that harnesses the progressive thinking of Chain-of-Thought reasoning to tap into the latent potential of pre-trained models like BERT. Additionally, we develop an advanced contrastive learning loss function and propose a novel template denoising strategy. Rigorous experimentation demonstrates that CoT-BERT surpasses a range of well-established baselines by relying exclusively on the intrinsic strengths of pre-trained models.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts [14.632649933582648]
EquiPrompt is a novel method employing Chain of Thought (CoT) reasoning to reduce biases in text-to-image generative models.
It integrates iterative bootstrapping and bias-aware selection to balance creativity and ethical responsibility.
arXiv Detail & Related papers (2024-06-13T12:55:10Z) - Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss [3.435381469869212]
This paper presents an innovative regression framework and proposes two simple yet effective loss functions: Translated ReLU and Smooth K2 Loss.
Our method achieves convincing performance across seven established STS benchmarks, especially when supplemented with task-specific training data.
arXiv Detail & Related papers (2024-06-08T02:52:43Z) - Enhancing Systematic Decompositional Natural Language Inference Using
Informal Logic [53.363888563647976]
We develop a consistent and theoretically grounded approach to annotating decompositional entailment datasets.
We find that our resulting dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in a modern neuro-symbolic reasoning engine significantly improves results.
arXiv Detail & Related papers (2024-02-22T18:55:17Z) - Understanding Self-Supervised Learning of Speech Representation via
Invariance and Redundancy Reduction [0.45060992929802207]
Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data.
This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception.
arXiv Detail & Related papers (2023-09-07T10:23:59Z) - Implicit Counterfactual Data Augmentation for Deep Neural Networks [3.6397924689580745]
Machine-learning models are prone to capturing spurious correlations between non-causal attributes and classes.
This study proposes an implicit counterfactual data augmentation method to remove spurious correlations and make stable predictions.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Robust Dialogue State Tracking with Weak Supervision and Sparse Data [2.580163308334609]
Generalising dialogue state tracking (DST) to new data is challenging due to the strong reliance on abundant and fine-grained supervision during training.
Sample sparsity, distributional shift and the occurrence of new concepts and topics frequently lead to severe performance degradation during inference.
We propose a training strategy to build extractive DST models without the need for fine-grained manual span labels.
arXiv Detail & Related papers (2022-02-07T16:58:12Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.