vLLM

vLLM is an open source inference serving software. You will need to deploy vLLM for your HPC compute project. The easiest way is to use the official container.

Use one of the official images depending on your GPU vendor:

NVIDIA CUDA is the officially supported container
AMD ROCm is the AMD port

Container Image

Directly pull and convert on HPC node using Apptainer:

# vLLM image for NVIDIA
apptainer pull vllm-openai-nightly.sif docker://vllm/vllm-openai:nightly

# vLLM image for AMD ROCm
apptainer pull vllm-openai-nightly.sif docker://rocm/vllm

Make sure the .sif image is then placed at <your_project_path>/containers