Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
- URL: http://arxiv.org/abs/2408.05147v2
- Date: Mon, 19 Aug 2024 07:51:05 GMT
- Title: Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
- Authors: Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca Dragan, Rohin Shah, Neel Nanda,
- Abstract summary: Sparse autoencoders (SAEs) are an unsupervised method for learning a neural network's latent representations into seemingly interpretable features.
In this work, we introduce Gemma Scope, an open suite of JumpReLU SAEs trained on all layers and sub-layers of Gemma 2 2B and 9B.
We evaluate the quality of each SAE on standard metrics and release these results.
- Score: 11.169778211035826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse autoencoders (SAEs) are an unsupervised method for learning a sparse decomposition of a neural network's latent representations into seemingly interpretable features. Despite recent excitement about their potential, research applications outside of industry are limited by the high cost of training a comprehensive suite of SAEs. In this work, we introduce Gemma Scope, an open suite of JumpReLU SAEs trained on all layers and sub-layers of Gemma 2 2B and 9B and select layers of Gemma 2 27B base models. We primarily train SAEs on the Gemma 2 pre-trained models, but additionally release SAEs trained on instruction-tuned Gemma 2 9B for comparison. We evaluate the quality of each SAE on standard metrics and release these results. We hope that by releasing these SAE weights, we can help make more ambitious safety and interpretability research easier for the community. Weights and a tutorial can be found at https://huggingface.co/google/gemma-scope and an interactive demo can be found at https://www.neuronpedia.org/gemma-scope
Related papers
- Gemma 3 Technical Report [198.3299202423321]
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models.
This version introduces vision understanding abilities, a wider coverage of languages and longer context.
We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context.
arXiv Detail & Related papers (2025-03-25T15:52:34Z) - Low-Rank Adapting Models for Sparse Autoencoders [6.932760557251821]
We use low-rank adaptation (LoRA) to finetune the language model itself around a previously trained SAE.
We analyze our method across SAE sparsity, SAE width, language model size, LoRA rank, and model layer on the Gemma Scope family of SAEs.
arXiv Detail & Related papers (2025-01-31T18:59:16Z) - Training Software Engineering Agents and Verifiers with SWE-Gym [89.55822534364727]
SWE-Gym is the first environment for training real-world software engineering (SWE) agents.
SWE-Gym contains 2,438 real-world Python task instances.
arXiv Detail & Related papers (2024-12-30T18:15:39Z) - Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders [115.34050914216665]
Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models.
We introduce a suite of 256 SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features.
We assess the generalizability of SAEs trained on base models to longer contexts and fine-tuned models.
arXiv Detail & Related papers (2024-10-27T17:33:49Z) - Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small [6.306964287762374]
We evaluate whether SAEs trained on hidden representations of GPT-2 small have sets of features that mediate knowledge of which country a city is in and which continent it is in.
Our results show that SAEs struggle to reach the neuron baseline, and none come close to the DAS skyline.
arXiv Detail & Related papers (2024-09-05T18:00:37Z) - Gemma 2: Improving Open Language Models at a Practical Size [118.04200128754249]
We introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models.
We apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions.
The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger.
arXiv Detail & Related papers (2024-07-31T19:13:07Z) - Linear-Complexity Self-Supervised Learning for Speech Processing [17.360059094663182]
Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPU.
This paper studies a linear-complexity context encoder for SSL for the first time.
arXiv Detail & Related papers (2024-07-18T10:34:33Z) - Gemma: Open Models Based on Gemini Research and Technology [128.57714343844074]
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.
Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety.
arXiv Detail & Related papers (2024-03-13T06:59:16Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - Weakly-Supervised Action Localization with Expectation-Maximization
Multi-Instance Learning [82.41415008107502]
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments)
We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions.
arXiv Detail & Related papers (2020-03-31T23:36:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.