Skip to content

vLLM

vLLM is an open source inference serving software. You will need to deploy vLLM for your HPC compute project. The easiest way is to use the official container.

Use one of the official images depending on your GPU vendor:

Container Image

Directly pull and convert on HPC node using Apptainer:

# vLLM image for NVIDIA
apptainer pull vllm-openai-nightly.sif docker://vllm/vllm-openai:nightly
# vLLM image for AMD ROCm
apptainer pull vllm-openai-nightly.sif docker://rocm/vllm

Make sure the .sif image is then placed at <your_project_path>/containers