**Issue with Dockerized DeepSpeed-MII Persistent Deployment (LLama 3 70B Model) – Tensor Parallelism Not Effective**

Hi Team,

I’m facing an issue while containerizing a DeepSpeed-MII deployment using the persistent server mode.

### Steps Performed:

1. I created a `mii_server.py` script following the \[DeepSpeed-MII documentation] using persistent mode:

   ```python
   import mii

   MODEL_PATH = "./llama-3-70b-finetuned"  # LoRA+base model merged
   DEPLOYMENT_NAME = "test-deepspeed"

   client = mii.serve(
       model_name_or_path=MODEL_PATH,
       deployment_name=DEPLOYMENT_NAME,
       tensor_parallel=2,
       enable_restful_api=True,
       restful_api_port=8084,
       max_length=2048
   )
   ```
2. This script runs successfully on my local machine using 2 GPUs.
3. I built a Docker image based on the same setup and ran it using:

   ```bash
   docker run --gpus all --shm-size=10g -e CUDA_VISIBLE_DEVICES=0,1 -p 8084:8084 <image>
   ```

However, within the container, the model consistently runs into CUDA OOM errors. Both GPUs report approximately a **10Gi deficit**, which is the same as trying to run the model without tensor parallelism on a single GPU.

This leads me to believe that **tensor parallelism isn’t being correctly applied in the containerized environment**, even though it works as expected locally.

---

### Environment Details

* **Model:** LLaMA-3 70B (merged LoRA)
* **CUDA Version:** 12.1.1
* **Python Version:** 3.10
* **Requirements:**

  ```
  deepspeed-mii
  numpy==2.1.3
  triton==3.3.1
  ```

---

Is there any additional configuration or consideration required when deploying DeepSpeed-MII in Docker to ensure tensor parallelism is honored?

Any guidance or recommended best practices for dockerizing the MII persistent server with large models would be highly appreciated.

Thanks,
Inderjeet Vishnoi

---

Let me know if you want to include Dockerfile/volume/mount details too.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Dockerized DeepSpeed-MII Persistent Deployment (LLama 3 70B Model) – Tensor Parallelism Not Effective #568

Steps Performed:

Environment Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

**Issue with Dockerized DeepSpeed-MII Persistent Deployment (LLama 3 70B Model) – Tensor Parallelism Not Effective** #568

Description

Steps Performed:

Environment Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Issue with Dockerized DeepSpeed-MII Persistent Deployment (LLama 3 70B Model) – Tensor Parallelism Not Effective #568