6 GPU and ML

06_gpu_and_ml

Author

Ryan Wesslen

Let’s get to what we’re all here for: GPUs!

6.1 import_torch.py

6.1.1 PyTorch with CUDA GPU support

This example shows how you can use CUDA GPUs in Modal, with a minimal PyTorch image. You can specify GPU requirements in the app.function decorator.

import_torch.py

import time

import modal

app = modal.App(
    "example-import-torch",
    image=modal.Image.debian_slim().pip_install(
        "torch", find_links="https://download.pytorch.org/whl/cu116"
    ),
)


@app.function(gpu="any")
def gpu_function():
    import subprocess

    import torch

    subprocess.run(["nvidia-smi"])
    print("Torch version:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA device count:", torch.cuda.device_count())


if __name__ == "__main__":
    t0 = time.time()
    with app.run():
        gpu_function.remote()
    print("Full time spent:", time.time() - t0)

Let’s run it:

$ modal run 06_gpu_and_ml/import_torch.py 
✓ Initialized. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx
Building image im-q9v0dExl8NyFXzmsp0RKxA

=> Step 0: FROM base

=> Step 1: RUN python -m pip install torch --find-links https://download.pytorch.org/whl/cu116
Looking in indexes: http://pypi-mirror.modal.local:5555/simple
Looking in links: https://download.pytorch.org/whl/cu116
Collecting torch
  Downloading http://pypi-mirror.modal.local:5555/simple/torch/torch-2.3.1-cp310-cp310-manylinux1_x86_64.whl (779.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 779.1/779.1 MB 249.1 MB/s eta 0:00:00

...

Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.7.0
    Uninstalling typing_extensions-4.7.0:
      Successfully uninstalled typing_extensions-4.7.0
Successfully installed MarkupSafe-2.1.5 filelock-3.15.1 fsspec-2024.6.0 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 sympy-1.12.1 torch-2.3.1 triton-2.3.1 typing-extensions-4.12.2

[notice] A new release of pip is available: 23.1.2 -> 24.0
[notice] To update, run: pip install --upgrade pip
Creating image snapshot...
Finished snapshot; took 11.22s

Built image im-q9v0dExl8NyFXzmsp0RKxA in 106.04s
✓ Created objects.
├── 🔨 Created mount /Users/ryan/modal-examples/06_gpu_and_ml/import_torch.py
└── 🔨 Created function gpu_function.
Fri Jun 14 20:37:32 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10                     On  |   00000000:CA:00.0 Off |                 ERR! |
|  0%   30C ERR!              15W /  150W |       0MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
Torch version: 2.3.1+cu121
CUDA available: True
CUDA device count: 1
Stopping app - local entrypoint completed.
✓ App completed. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx

6.2 Stable diffusion via HF

For this, you’ll need to create a secret via Modal for Huggingface.

stable_diffusion.py

import io
import os

import modal

app = modal.App()


@app.function(
    image=modal.Image.debian_slim().pip_install("torch", "diffusers[torch]", "transformers", "ftfy"),
    secrets=[modal.Secret.from_name("huggingface")],
    gpu="any",
)
async def run_stable_diffusion(prompt: str):
    from diffusers import StableDiffusionPipeline

    pipe = StableDiffusionPipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        use_auth_token=os.environ["HF_TOKEN"],
    ).to("cuda")

    image = pipe(prompt, num_inference_steps=10).images[0]

    buf = io.BytesIO()
    image.save(buf, format="PNG")
    img_bytes = buf.getvalue()

    return img_bytes


@app.local_entrypoint()
def main():
    img_bytes = run_stable_diffusion.remote("Tri-color beagle riding a bike in Paris, wearing a black beret, and a baguette in a bag in the bike's front basket.")
    with open("/tmp/parisian-beagle.png", "wb") as f:
        f.write(img_bytes)

Let’s run it!

--- title: "GPU and ML" subtitle: "06_gpu_and_ml" author: "Ryan Wesslen" toc: true format: html: html-math-method: katex code-tools: true execute: echo: true eval: false --- Let's get to what we're all here for: GPUs! ## import_torch.py ### PyTorch with CUDA GPU support This example shows how you can use CUDA GPUs in Modal, with a minimal PyTorch image. You can specify GPU requirements in the `app.function` decorator. ```{.python filename="import_torch.py"} import time import modal app = modal.App( "example-import-torch", image=modal.Image.debian_slim().pip_install( "torch", find_links="https://download.pytorch.org/whl/cu116" ), ) @app.function(gpu="any") def gpu_function(): import subprocess import torch subprocess.run(["nvidia-smi"]) print("Torch version:", torch.__version__) print("CUDA available:", torch.cuda.is_available()) print("CUDA device count:", torch.cuda.device_count()) if __name__ == "__main__": t0 = time.time() with app.run(): gpu_function.remote() print("Full time spent:", time.time() - t0) ``` Let's run it: ```bash $ modal run 06_gpu_and_ml/import_torch.py ✓ Initialized. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx Building image im-q9v0dExl8NyFXzmsp0RKxA => Step 0: FROM base => Step 1: RUN python -m pip install torch --find-links https://download.pytorch.org/whl/cu116 Looking in indexes: http://pypi-mirror.modal.local:5555/simple Looking in links: https://download.pytorch.org/whl/cu116 Collecting torch Downloading http://pypi-mirror.modal.local:5555/simple/torch/torch-2.3.1-cp310-cp310-manylinux1_x86_64.whl (779.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 779.1/779.1 MB 249.1 MB/s eta 0:00:00 ... Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch Attempting uninstall: typing-extensions Found existing installation: typing_extensions 4.7.0 Uninstalling typing_extensions-4.7.0: Successfully uninstalled typing_extensions-4.7.0 Successfully installed MarkupSafe-2.1.5 filelock-3.15.1 fsspec-2024.6.0 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 sympy-1.12.1 torch-2.3.1 triton-2.3.1 typing-extensions-4.12.2 [notice] A new release of pip is available: 23.1.2 -> 24.0 [notice] To update, run: pip install --upgrade pip Creating image snapshot... Finished snapshot; took 11.22s Built image im-q9v0dExl8NyFXzmsp0RKxA in 106.04s ✓ Created objects. ├── 🔨 Created mount /Users/ryan/modal-examples/06_gpu_and_ml/import_torch.py └── 🔨 Created function gpu_function. Fri Jun 14 20:37:32 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A10 On | 00000000:CA:00.0 Off | ERR! | | 0% 30C ERR! 15W / 150W | 0MiB / 23028MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ Torch version: 2.3.1+cu121 CUDA available: True CUDA device count: 1 Stopping app - local entrypoint completed. ✓ App completed. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx ``` ## Stable diffusion via HF For this, you'll need to create a secret via Modal for Huggingface. ```{.python filename="stable_diffusion.py"} import io import os import modal app = modal.App() @app.function( image=modal.Image.debian_slim().pip_install("torch", "diffusers[torch]", "transformers", "ftfy"), secrets=[modal.Secret.from_name("huggingface")], gpu="any", ) async def run_stable_diffusion(prompt: str): from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", use_auth_token=os.environ["HF_TOKEN"], ).to("cuda") image = pipe(prompt, num_inference_steps=10).images[0] buf = io.BytesIO() image.save(buf, format="PNG") img_bytes = buf.getvalue() return img_bytes @app.local_entrypoint() def main(): img_bytes = run_stable_diffusion.remote("Tri-color beagle riding a bike in Paris, wearing a black beret, and a baguette in a bag in the bike's front basket.") with open("/tmp/parisian-beagle.png", "wb") as f: f.write(img_bytes) ``` Let's run it! ```bash $ ```