move src/MaxText/configs to src/maxtext/configs #3044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

khatwanimohit wants to merge 1 commit into main from mohit/move_configs

.github/workflows/run_jupyter_notebooks.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -90,8 +90,11 @@ jobs: @@
               PYTHONPATH: "${{ github.workspace }}/src"
               HF_TOKEN: ${{ secrets.HF_TOKEN }}
             run: |
-              MAXTEXT_REPO_ROOT=$(pwd)
-              MAXTEXT_NOTEBOOKS_ROOT="$MAXTEXT_REPO_ROOT/src/maxtext/examples"
+              source .venv/bin/activate
+              export MAXTEXT_REPO_ROOT=$(pwd)
+              export MAXTEXT_PKG_DIR=$(pwd)/src/maxtext
+              export MAXTEXT_NOTEBOOKS_ROOT="$MAXTEXT_REPO_ROOT/src/maxtext/examples"
               for notebook in "$MAXTEXT_NOTEBOOKS_ROOT"/{sft,rl}*.ipynb; do
                 filename=$(basename "$notebook")
@@ Expand All / @@ -101,7 +104,7 @@ jobs: @@
                 echo "Running $filename ..."
                 echo "------------------------------------------------------"
-                .venv/bin/papermill "$notebook" "$output_name" -k maxtext_venv
+                papermill "$notebook" "$output_name" -k maxtext_venv
               done
           - name: Record Commit IDs
             shell: bash
@@ Expand Down @@

.github/workflows/run_pathways_tests.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -100,7 +100,7 @@ jobs: @@
               export MAXTEXT_REPO_ROOT=$(pwd)
               export MAXTEXT_ASSETS_ROOT=$(pwd)/src/maxtext/assets
               export MAXTEXT_TEST_ASSETS_ROOT=$(pwd)/tests/assets
-              export MAXTEXT_PKG_DIR=$(pwd)/src/MaxText
+              export MAXTEXT_PKG_DIR=$(pwd)/src/maxtext
               # TODO(b/454659463): Enable test_default_hlo_match after volume mount is supported.
               .venv/bin/python3 -m pytest ${{ inputs.pytest_addopts }} -v -m "${FINAL_PYTEST_MARKER}" -k "not AotHloIdenticalTest and not CompileThenLoad" --durations=0
             env:
@@ Expand Down @@

.github/workflows/run_tests_against_package.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -110,7 +110,7 @@ jobs: @@
               export MAXTEXT_REPO_ROOT=$(pwd)
               export MAXTEXT_ASSETS_ROOT=$(pwd)/src/maxtext/assets
               export MAXTEXT_TEST_ASSETS_ROOT=$(pwd)/tests/assets
-              export MAXTEXT_PKG_DIR=$(pwd)/src/MaxText
+              export MAXTEXT_PKG_DIR=$(pwd)/src/maxtext
               # omit this libtpu init args for gpu tests
               if [ "${{ inputs.device_type }}" != "cuda12" ]; then
                 export LIBTPU_INIT_ARGS='--xla_tpu_scoped_vmem_limit_kib=65536'
@@ Expand Down @@

.vscode/launch.json

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -9,7 +9,7 @@
  
          "justMyCode": false,

          "python": "python3",

          "module": "maxtext.decode",

          "args": ["src/MaxText/configs/base.yml",

          "args": ["src/maxtext/configs/base.yml",

                   "run_name=runner_$(date +%Y-%m-%d-%H-%M)",

                   "base_output_directory=gs://test-maxtext-output",

                   "dataset_path=gs://test-maxtext-dataset",

    @@ -36,7 +36,7 @@
  
          "justMyCode": false,

          "python": "python3",

          "module": "maxtext.decode",

          "args": ["src/MaxText/configs/base.yml",

          "args": ["src/maxtext/configs/base.yml",

                   "run_name=runner_$(date +%Y-%m-%d-%H-%M)",

                   "base_output_directory=gs://test-maxtext-output",

                   "dataset_path=gs://test-maxtext-dataset",

    @@ -52,7 +52,7 @@
  
          "justMyCode": false,

          "python": "python3",

          "module": "MaxText.train",

          "args": ["src/MaxText/configs/base.yml",

          "args": ["src/maxtext/configs/base.yml",

                   "run_name=runner_$(date +%Y-%m-%d-%H-%M)",

                   "base_output_directory=gs://test-maxtext-output",

                   "dataset_path=gs://test-maxtext-dataset",

    @@ -68,7 +68,7 @@
  
          "python": "python3",

          "module": "maxtext.inference.inference_microbenchmark",

          "args": [

            "src/MaxText/configs/base.yml",

            "src/maxtext/configs/base.yml",

            "model_name=llama2-7b",

            "tokenizer_path=src/maxtext/assets/tokenizers/tokenizer.llama2",

            "weight_dtype=bfloat16",

PREFLIGHT.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -7,12 +7,12 @@ Before you run ML workload on Multihost with GCE or GKE, simply apply `bash pref
  
    Here is an example for GCE:

    ```

    bash preflight.sh PLATFORM=GCE && python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$YOUR_JOB_NAME

    bash preflight.sh PLATFORM=GCE && python3 -m MaxText.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME

    ```

    Here is an example for GKE:

    ```

    bash preflight.sh PLATFORM=GKE && python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$YOUR_JOB_NAME

    bash preflight.sh PLATFORM=GKE && python3 -m MaxText.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME

    ```

    # Optimization 2: Numa binding (You can only apply this to v4 and v5p)

    @@ -22,14 +22,14 @@ For GCE,
  
    [preflight.sh](https://github.com/google/maxtext/blob/main/preflight.sh) will help you install `numactl` dependency, so you can use it directly, here is an example:

    ```

    bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$YOUR_JOB_NAME

    bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m MaxText.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME

    ```

    For GKE,

    `numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example

    ```

    bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m MaxText.train src/MaxText/configs/base.yml run_name=$YOUR_JOB_NAME

    bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m MaxText.train src/maxtext/configs/base.yml run_name=$YOUR_JOB_NAME

    ```

    1. `numactl`: This is the command-line tool used for controlling NUMA policy for processes or shared memory. It's particularly useful on multi-socket systems where memory locality can impact performance.

benchmarks/api_server/README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -33,7 +33,7 @@ export HF_TOKEN=<your_hugging_face_token>
  
    The primary way to launch the API server is by using the `start_server.sh` script. This script ensures that the server is run from the project's root directory, which is necessary for the Python interpreter to find all the required modules.

    The script takes the path to a base configuration file (e.g., `MaxText/configs/base.yml`) followed by any number of model-specific configuration overrides.

    The script takes the path to a base configuration file (e.g., `maxtext/configs/base.yml`) followed by any number of model-specific configuration overrides.

    ### Benchmarking Configuration

    @@ -56,7 +56,7 @@ Here is an example of how to launch the server with a `qwen3-30b-a3b` model, con
  
    # Make sure you are in the root directory of the maxtext project.

    bash benchmarks/api_server/start_server.sh \

        MaxText/configs/base.yml \

        maxtext/configs/base.yml \

        model_name="qwen3-30b-a3b" \

        tokenizer_path="Qwen/Qwen3-30B-A3B-Thinking-2507" \

        load_parameters_path="<path_to_your_checkpoint>" \

    @@ -135,7 +135,7 @@ CMD="export HF_TOKEN=${HF_TOKEN} && \
  
         pip install --upgrade pip && \

         pip install -r benchmarks/api_server/requirements.txt && \

         bash benchmarks/api_server/start_server.sh \

            MaxText/configs/base.yml \

            maxtext/configs/base.yml \

            model_name="${MODEL_NAME}" \

            tokenizer_path="${TOKENIZER_PATH}" \

            load_parameters_path="${LOAD_PARAMETERS_PATH}" \

benchmarks/api_server/launch_gke_server.sh.template

-Original file line number
+Diff line change
@@ Expand Up / @@ -53,7 +53,7 @@ CMD="export HF_TOKEN=${HF_TOKEN} && \ @@
          pip install --upgrade pip && \
          pip install -r benchmarks/api_server/requirements.txt && \
          bash benchmarks/api_server/start_server.sh \
-            MaxText/configs/base.yml \
+            maxtext/configs/base.yml \
             model_name=\"${MODEL_NAME}\" \
             tokenizer_path=\"${TOKENIZER_PATH}\" \
             load_parameters_path=\"${LOAD_PARAMETERS_PATH}\" \
@@ Expand Down @@

benchmarks/api_server/start_server.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -20,7 +20,7 @@ @@
     #
     # Example:
     # bash benchmarks/api_server/start_server.sh \
-    #     MaxText/configs/base.yml \
+    #     maxtext/configs/base.yml \
     #     model_name="qwen3-30b-a3b" \
     #     tokenizer_path="Qwen/Qwen3-30B-A3B-Thinking-2507" \
     #     load_parameters_path="<path_to_your_checkpoint>" \
@@ Expand Down @@

benchmarks/globals.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -25,7 +25,10 @@ @@
         r if os.path.isdir(os.path.join(r := os.path.dirname(os.path.dirname(__file__)), ".git")) else MAXTEXT_PKG_DIR,
     )
+    # This is the configs root: with "base.yml"; "models/"; &etc.
+    MAXTEXT_CONFIGS_DIR = os.environ.get("MAXTEXT_CONFIGS_DIR", os.path.join(MAXTEXT_REPO_ROOT, "src", "maxtext", "configs"))
     # This is the assets root: with "tokenizers/"; &etc.
     MAXTEXT_ASSETS_ROOT = os.environ.get("MAXTEXT_ASSETS_ROOT", os.path.join(MAXTEXT_REPO_ROOT, "src", "maxtext", "assets"))
-    __all__ = ["MAXTEXT_ASSETS_ROOT", "MAXTEXT_PKG_DIR", "MAXTEXT_REPO_ROOT"]
+    __all__ = ["MAXTEXT_ASSETS_ROOT", "MAXTEXT_CONFIGS_DIR", "MAXTEXT_PKG_DIR", "MAXTEXT_REPO_ROOT"]

benchmarks/maxtext_xpk_runner.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -35,7 +35,7 @@
  
    import omegaconf

    import benchmarks.maxtext_trillium_model_configs as model_configs

    from benchmarks.globals import MAXTEXT_PKG_DIR

    from benchmarks.globals import MAXTEXT_CONFIGS_DIR

    from benchmarks.command_utils import run_command_with_updates

    import benchmarks.xla_flags_library as xla_flags

    from benchmarks.disruption_management.disruption_handler import DisruptionConfig

    @@ -107,7 +107,7 @@ class WorkloadConfig:
  
      generate_metrics_and_upload_to_big_query: bool = True

      hardware_id: str = "v6e"

      metrics_gcs_file: str = ""

      base_config: str = os.path.join(MAXTEXT_PKG_DIR, "configs", "base.yml")

      base_config: str = os.path.join(MAXTEXT_CONFIGS_DIR, "base.yml")

      topology: str = dataclasses.field(init=False)

      num_devices_per_slice: int = dataclasses.field(init=False)

      db_project: str = ""

    @@ -354,7 +354,7 @@ def _build_args_from_config(wl_config: WorkloadConfig) -> dict:
  
          "xla_flags": f"'{xla_flags_str}'",

          "dataset": dataset,

          "run_type": "maxtext-xpk",

          "config_file": os.path.join(MAXTEXT_PKG_DIR, "configs", "base.yml"),

          "config_file": os.path.join(MAXTEXT_CONFIGS_DIR, "base.yml"),

          "topology": wl_config.topology,

          "tuning_params": f"'{tuning_params_str}'",

          "db_project": wl_config.db_project,

    @@ -440,7 +440,7 @@ def build_user_command(
  
              f"export JAX_PLATFORMS={jax_platforms} &&",

              "export ENABLE_PJRT_COMPATIBILITY=true &&",

              "export MAXTEXT_ASSETS_ROOT=/deps/src/maxtext/assets MAXTEXT_PKG_DIR=/deps/src/MaxText MAXTEXT_REPO_ROOT=/deps &&"

              f'{hlo_dump} python3 -m MaxText.train {os.path.join(MAXTEXT_PKG_DIR, "configs", "base.yml")}',

              f'{hlo_dump} python3 -m MaxText.train {os.path.join(MAXTEXT_CONFIGS_DIR, "base.yml")}',

              f"{config_tuning_params}",

              f"steps={wl_config.num_steps}",

              f"model_name={wl_config.model.model_type}",

benchmarks/mmlu/mmlu_eval.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -20,21 +20,21 @@
  
    To run the MMLU benchmark:

    # Default is zero-shot prompting

    python3 -m benchmarks.mmlu.mmlu_eval src/MaxText/configs/base.yml \

    python3 -m benchmarks.mmlu.mmlu_eval src/maxtext/configs/base.yml \

      tokenizer_path=src/maxtext/assets/tokenizer_llama3.tiktoken \

      load_parameters_path=check_point_path model_name=llama3.1-8b \

      max_prefill_predict_length=1024 max_target_length=2048 ici_tensor_parallelism=4 per_device_batch_size=1

    # Example of using the prompt_template flag for Chain-of-Thought (CoT) prompting:

    python3 -m benchmarks.mmlu.mmlu_eval src/MaxText/configs/base.yml \

    python3 -m benchmarks.mmlu.mmlu_eval src/maxtext/configs/base.yml \

      tokenizer_path=src/maxtext/assets/tokenizer_llama3.tiktoken \

      load_parameters_path=check_point_path model_name=llama3.1-8b \

      max_prefill_predict_length=1024 max_target_length=2048 ici_tensor_parallelism=4 per_device_batch_size=1 \

      prompt_template="The following are multiple choice questions (with answers) about {subject}.\n\n{question}\n

      {choices}\nAnswer: Let's think step by step."

    # Example of using the prompt_template flag for 5-shot prompting (replace with actual examples):

    python3 -m benchmarks.mmlu.mmlu_eval src/MaxText/configs/base.yml \

    python3 -m benchmarks.mmlu.mmlu_eval src/maxtext/configs/base.yml \

      tokenizer_path=src/maxtext/assets/tokenizer_llama3.tiktoken \

      load_parameters_path=check_point_path model_name=llama3.1-8b \

      max_prefill_predict_length=1024 max_target_length=2048 ici_tensor_parallelism=4 per_device_batch_size=1 \

docs/guides/checkpointing_solutions/convert_checkpoint.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -66,7 +66,7 @@ export LAZY_LOAD_TENSORS=<Flag to lazy load> # True to use lazy load, False to u
  
    Finally, run below command to complete the conversion

    ```bash

    python3 -m MaxText.utils.ckpt_conversion.to_maxtext MaxText/configs/base.yml \

    python3 -m MaxText.utils.ckpt_conversion.to_maxtext maxtext/configs/base.yml \

        model_name=${HF_MODEL} \

        hf_access_token=${HF_TOKEN} \

        base_output_directory=${MODEL_CHECKPOINT_DIRECTORY} \

    @@ -104,7 +104,7 @@ Use the `to_huggingface.py` script to convert a MaxText checkpoint into the Hugg
  
    The following command converts a MaxText checkpoint and saves it locally, to GCS, or uploads it directly to the Hugging Face Hub.

    ```bash

    python3 -m MaxText.utils.ckpt_conversion.to_huggingface src/MaxText/configs/base.yml \

    python3 -m MaxText.utils.ckpt_conversion.to_huggingface src/maxtext/configs/base.yml \

        model_name=<MODEL_NAME> \

        load_parameters_path=<path-to-maxtext-checkpoint> \

        base_output_directory=<path-to-save-converted-checkpoint> \

    @@ -131,7 +131,7 @@ To ensure the conversion was successful, you can use the `tests/utils/forward_pa
  
    ### Usage

    ```bash

    python3 -m tests.utils.forward_pass_logit_checker src/MaxText/configs/base.yml \

    python3 -m tests.utils.forward_pass_logit_checker src/maxtext/configs/base.yml \

        tokenizer_path=assets/<tokenizer> \

        load_parameters_path=<path-to-maxtext-checkpoint> \

        model_name=<MODEL_NAME> \

    @@ -216,8 +216,8 @@ To extend conversion support to a new model architecture, you must define its sp
  
    - In [`utils/param_mapping.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/param_mapping.py), add the `hook_fn` logic (`def {MODEL}_MAXTEXT_TO_HF_PARAM_HOOK_FN`). This is the transformation needed per layer.

    2. **Add Hugging Face weights Shape**: In [`utils/hf_shape.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/hf_shape.py), define the tensor shape of Hugging Face format (`def {MODEL}_HF_WEIGHTS_TO_SHAPE`). This is used to ensure the tensor shape is matched after to_huggingface conversion.

    1. **Register model key**: In [`utils/utils.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/utils.py), add the new model key in `HF_IDS`.

    1. **Add transformer config**: In [`utils/hf_model_configs.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/hf_model_configs.py), add the `transformers.Config` object, describing the Hugging Face model configuration (defined in ['src/MaxText/configs/models'](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/MaxText/configs/models)). **Note**: This configuration must precisely match the MaxText model's architecture.

    3. **Register model key**: In [`utils/utils.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/utils.py), add the new model key in `HF_IDS`.

    4. **Add transformer config**: In [`utils/hf_model_configs.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/utils/ckpt_conversion/utils/hf_model_configs.py), add the `transformers.Config` object, describing the Hugging Face model configuration (defined in ['src/maxtext/configs/models'](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/maxtext/configs/models)). **Note**: This configuration must precisely match the MaxText model's architecture.

    Here is an example [PR to add support for gemma3 multi-modal model](https://github.com/AI-Hypercomputer/maxtext/pull/1983)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move src/MaxText/configs to src/maxtext/configs #3044

Uh oh!

Diff view

Diff view

Uh oh!

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!

move src/MaxText/configs to src/maxtext/configs #3044

Are you sure you want to change the base?

Uh oh!

move src/MaxText/configs to src/maxtext/configs #3044

Uh oh!

Diff view

Diff view

Uh oh!

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!