AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives
- URL: http://arxiv.org/abs/2510.04983v3
- Date: Mon, 03 Nov 2025 21:48:02 GMT
- Title: AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives
- Authors: Khalid Mehtab Khan, Anagha Kulkarni,
- Abstract summary: AWARE is a framework that attempts to improve a transformer model's awareness for this nuanced task.<n>We show that by making the model explicitly aware of the properties of the input, AWARE outperforms a strong baseline by 2.1 percentage points in Macro-F1.<n>This work provides a robust and generalizable methodology for any text classification task in which meaning depends on the context of the narrative.
- Score: 0.5514573274011145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying cultural capital (CC) themes in student reflections can offer valuable insights that help foster equitable learning environments in classrooms. However, themes such as aspirational goals or family support are often woven into narratives, rather than appearing as direct keywords. This makes them difficult to detect for standard NLP models that process sentences in isolation. The core challenge stems from a lack of awareness, as standard models are pre-trained on general corpora, leaving them blind to the domain-specific language and narrative context inherent to the data. To address this, we introduce AWARE, a framework that systematically attempts to improve a transformer model's awareness for this nuanced task. AWARE has three core components: 1) Domain Awareness, adapting the model's vocabulary to the linguistic style of student reflections; 2) Context Awareness, generating sentence embeddings that are aware of the full essay context; and 3) Class Overlap Awareness, employing a multi-label strategy to recognize the coexistence of themes in a single sentence. Our results show that by making the model explicitly aware of the properties of the input, AWARE outperforms a strong baseline by 2.1 percentage points in Macro-F1 and shows considerable improvements across all themes. This work provides a robust and generalizable methodology for any text classification task in which meaning depends on the context of the narrative.
Related papers
- Vision Large Language Models Are Good Noise Handlers in Engagement Analysis [54.397912827957164]
We propose a framework leveraging Vision Large Language Models (VLMs) to refine annotations and guide the training process.<n>Our framework uses a questionnaire to extract behavioral cues and split data into high- and low-reliability subsets.<n>We demonstrate that classical computer vision models trained on refined high-reliability subsets and enhanced with our curriculum strategy show improvements.
arXiv Detail & Related papers (2025-11-18T18:50:26Z) - SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space [11.534994345027362]
Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation.<n>We introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance.<n>We introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder.
arXiv Detail & Related papers (2025-10-28T14:09:05Z) - Sigma: Semantically Informative Pre-training for Skeleton-based Sign Language Understanding [47.469519895247366]
Pre-training has proven effective for learning transferable features in sign language understanding tasks.<n>We propose Sigma, a unified skeleton-based SLU framework featuring: 1) a sign-aware early fusion mechanism that facilitates deep interaction between visual and textual modalities, enriching visual features with linguistic context; 2) a hierarchical alignment learning strategy that jointly maximises agreements across different levels of paired features from different modalities, effectively capturing both fine-grained details and high-level semantic relationships; and 3) a unified pre-training framework that combines contrastive learning, text matching and language modelling to promote semantic consistency and generalisation.
arXiv Detail & Related papers (2025-09-25T14:28:34Z) - Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval [15.126709823382539]
This work advances Contrastive Language-Image Pre-training (CLIP) for person representation learning.<n>We develop a noise-resistant data construction pipeline that leverages the in-context learning capabilities of MLLMs.<n>We introduce the GA-DMS framework, which improves cross-modal alignment by adaptively masking noisy textual tokens.
arXiv Detail & Related papers (2025-09-11T03:06:22Z) - Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models [26.91963265869296]
This work investigates the challenging task of identifying narrative roles in Internet memes.<n>It builds on an annotated dataset originally skewed toward the 'Other' class.<n> Comprehensive lexical and structural analyses highlight the nuanced, culture-specific, and context-rich language used in real memes.
arXiv Detail & Related papers (2025-06-29T07:12:11Z) - Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings [5.257719744958367]
This thesis explores three challenging settings in text classification by leveraging the intrinsic knowledge of pretrained language models (PLMs)
We develop models that utilize features based on contextualized word representations from PLMs, achieving performance that rivals or surpasses human accuracy.
Lastly, we tackle the sensitivity of large language models to in-context learning prompts by selecting effective demonstrations.
arXiv Detail & Related papers (2024-08-28T09:07:30Z) - Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer [15.21084337999065]
grounded situation recognition (GSR) requires the model to detect all semantic roles that participate in the action.
This complex task usually involves three steps: verb recognition, semantic role grounding, and noun recognition.
We introduce a new approach for zero-shot GSR via Language EXplainer (LEX)
arXiv Detail & Related papers (2024-04-24T10:17:13Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - data2vec: A General Framework for Self-supervised Learning in Speech,
Vision and Language [85.9019051663368]
data2vec is a framework that uses the same learning method for either speech, NLP or computer vision.
The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup.
Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance.
arXiv Detail & Related papers (2022-02-07T22:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.