Architecture

The system architecture is centered around the HEAppE Middleware. Its main purpose is to manage HPC jobs and provide an HTTP message tunnel to an HTTP service running at the compute node. The actual logic of job orchestration and periodic monitoring is implemented by the Job Daemon.

The inference job at the compute node then runs an HTTP server, which acts as a gateway for sending inference requests, obtaining runtime metrics, managing models and many other use-cases.

The service is exposed using a FastAPI server that accepts inference requests from authenticated users. The server automatically selects relevant and healthy HPC job for a given request. The frontend is implemented using the Streamlit framework, which provides a simple user interface for sending and displaying results in a well formatted HTML.

Architecture Diagram