Related papers: BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods

BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods

URL: http://arxiv.org/abs/2510.06811v1
Date: Wed, 08 Oct 2025 09:39:40 GMT
Title: BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods
Authors: Philipp Mondorf, Mingyang Wang, Sebastian Gerstner, Ahmad Dawar Hakimi, Yihong Liu, Leonor Veloso, Shijia Zhou, Hinrich Schütze, Barbara Plank,
Abstract summary: We investigate whether ensembling two or more circuit localization methods can improve performance.<n>In parallel ensembling, we combine attribution scores assigned to each edge by different methods.<n>In the sequential ensemble, we use edge attribution scores obtained via EAP-IG as a warm start for a more expensive but more precise circuit identification method.
Score: 64.5040037515574
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The Circuit Localization track of the Mechanistic Interpretability Benchmark (MIB) evaluates methods for localizing circuits within large language models (LLMs), i.e., subnetworks responsible for specific task behaviors. In this work, we investigate whether ensembling two or more circuit localization methods can improve performance. We explore two variants: parallel and sequential ensembling. In parallel ensembling, we combine attribution scores assigned to each edge by different methods-e.g., by averaging or taking the minimum or maximum value. In the sequential ensemble, we use edge attribution scores obtained via EAP-IG as a warm start for a more expensive but more precise circuit identification method, namely edge pruning. We observe that both approaches yield notable gains on the benchmark metrics, leading to a more precise circuit identification approach. Finally, we find that taking a parallel ensemble over various methods, including the sequential ensemble, achieves the best results. We evaluate our approach in the BlackboxNLP 2025 MIB Shared Task, comparing ensemble scores to official baselines across multiple model-task combinations.

Related papers

Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models [56.73385658981886]
Mechanistic interpretability (MI) seeks to uncover how language models (LMs) implement specific behaviors.<n>Recently released Mechanistic Interpretability Benchmark (MIB) provides framework for evaluating circuit and causal variable localization.<n>BlackboxNLP 2025 Shared Task extends MIB into a community-wide reproducible comparison of MI techniques.
arXiv Detail & Related papers (2025-11-23T11:33:59Z)
BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection [35.326040728422576]
We propose three key improvements to circuit discovery.<n>First, we use bootstrapping to identify edges with consistent attribution scores.<n>Second, we introduce a simple ratio-based selection strategy to prioritize strong positive-scoring edges.<n>Third, we replace the standard greedy selection with an integer linear programming formulation.
arXiv Detail & Related papers (2025-10-28T15:49:34Z)
PREMISE: Matching-based Prediction for Accurate Review Recommendation [25.506776502317436]
PREMISE is a new architecture for the matching-based learning in the multimodal fields for the multimodal review helpfulness task.<n>It computes the multi-scale and multi-field representations, filters duplicated semantics, and then obtained a set of matching scores as feature vectors for the downstream recommendation task.<n> Experimental results on two publicly available datasets show that PREMISE achieves promising performance with less computational cost.
arXiv Detail & Related papers (2025-05-02T13:23:13Z)
Self-Supervised Any-Point Tracking by Contrastive Random Walks [17.50529887238381]
We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks. Our method achieves strong performance on the TapVid benchmarks, outperforming previous self-supervised tracking methods.
arXiv Detail & Related papers (2024-09-24T17:59:56Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.<n>Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.<n>Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences. Exact methods yield better classification performance, but they pose high computational costs. We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z)
A Comparative Evaluation of Quantification Methods [2.802657211770274]
Quantification represents the problem of estimating the distribution of class labels on unseen data.<n>In this work, we compare 24 different methods on overall more than 40 data sets, considering binary as well as multiclass quantification settings.<n>No single algorithm generally outperforms all competitors, but identify a group of methods including the threshold selection-based Median Sweep and TSMax methods.<n>For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the HDx method, the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM
arXiv Detail & Related papers (2021-03-04T18:51:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.