Name	Name	Last commit message	Last commit date
parent directory ..
reproduce_evolution_of_concepts	reproduce_evolution_of_concepts
README.md	README.md
analyze_pythia_sae.py	analyze_pythia_sae.py
analyze_pythia_sae_with_pre_generated_activations.py	analyze_pythia_sae_with_pre_generated_activations.py
generate_pythia_activation_1d.py	generate_pythia_activation_1d.py
generate_pythia_activation_2d.py	generate_pythia_activation_2d.py
load_hf_model.py	load_hf_model.py
load_saelens_model.py	load_saelens_model.py
train_pythia_clt_topk.py	train_pythia_clt_topk.py
train_pythia_lorsa_topk.py	train_pythia_lorsa_topk.py
train_pythia_sae_batchtopk.py	train_pythia_sae_batchtopk.py
train_pythia_sae_jumprelu.py	train_pythia_sae_jumprelu.py
train_pythia_sae_topk.py	train_pythia_sae_topk.py
train_pythia_sae_with_pre_generated_activations.py	train_pythia_sae_with_pre_generated_activations.py

Name

Last commit message

Last commit date

README.md

analyze_pythia_sae.py

analyze_pythia_sae_with_pre_generated_activations.py

generate_pythia_activation_1d.py

generate_pythia_activation_2d.py

load_hf_model.py

load_saelens_model.py

train_pythia_clt_topk.py

train_pythia_lorsa_topk.py

train_pythia_sae_batchtopk.py

train_pythia_sae_jumprelu.py

train_pythia_sae_topk.py

train_pythia_sae_with_pre_generated_activations.py

Example setups of Language-Model-SAEs

The standard SAE-based pipeline of mechanistically interpreting internal representations of language models contains the following steps: Generating activations (optional) -> Training SAEs -> Analyzing SAEs -> Visualizing analyses.

Here present example setups of generating, training and analyzing, with variants of SAE architectures, activation functions and whether to use pre-generated activations.

Use on-the-fly model activations

SAE training requires stream of model activations at certain hook points (i.e. specefic location of model internal representation). Model activations can either be cached ahead-of-time on the disk, or produced on the fly.

For on-the-fly model activation usage, the Generating activations step can be skipped, and thus the overall pipeline is simplified. You can refer to train_pythia_sae_topk and analyze_pythia_sae and other scripts without a with_pre_generated_activations suffix to launch the experiments on Pythia. Note the analyzing requires a MongoDB instance (default to mongodb://localhost:27017) running to save the analyzing results.

Use cached activations

Cached activations are more common usage in practical SAE training and analyzing. It enables effective hyperparameter sweeping with reuse of generated activations, and also enables parallelled training and analyzing (DP/TP). However, it requires a non-trivial amount of disk space, e.g., caching 800M tokens of one layer activation of Pythia 160M requires about 6TB space.

To launch experiments with cached activations, you should first generate activations with 1d shape ((batch, d_model), for training use), and 2d shape ((batch, n_context, d_model), for analyzing use), by running generate_pythia_activation_1d and generate_pythia_activation_2d. Then, you can use train_pythia_sae_with_pre_generated_activations and analyze_pythia_sae_with_pre_generated_activations to run training and analyzing respectively, with a pre-generated activation path specified. Note the analyzing still requires a MongoDB instance running.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Example setups of Language-Model-SAEs

Use on-the-fly model activations

Use cached activations

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Example setups of Language-Model-SAEs

Use on-the-fly model activations

Use cached activations