Job Daemon

The job daemon is an asynchronously running component of the frontend service. Its purpose is to periodically retrieve the state of HEAppE resources, such as job statuses, SSH tunnels to job nodes, and cluster health. Using this information, the job daemon ensures that the inference server on the compute node remains operational by establishing or re-establishing SSH tunnels, reloading AI models, or replacing compute jobs when their walltime is close to expiring. The daemon refresh period is configurable and runs every 30 seconds by default.

The sequence diagram below illustrates the main workflow of the daemon:

Job daemon sequence diagram

After the main workflow completes, the daemon executes a series of routines that operate on the latest status of the inference job. These routines are designed to asynchronously carry out decision-making tasks, such as replacing jobs nearing their walltime limit or launching additional inference servers in response to increased user demand.

Daemon states

The daemon algorithm operates in 3 states:

Idle state — Entered when no requests have been made to the inference service for a specified period (default: 10 minutes). In this state, no inference jobs are submitted.
Active state — The daemon actively manages job submission and replacement for all registered AI models.
Active state with auto-scaling — The daemon allocates multiple inference jobs for models experiencing high demand and deallocates them when demand decreases.

(Note: Auto-scaling is not yet implemented.)

Job replacement strategy

The daemon will submit a new inference job for each registered AI model if no such job is available or if the last existing job is near its walltime limit.

The definition of "near" here is a percentage of the current jobs walltime, and by default is 90% of elapsed walltime. For example when a 120 minute inference job has only 12 minutes of time remaining, a replacement job is submitted.

Add custom routines

Routines have access to the most recent status of running and queued jobs. An example routine:

async def my_special_routine(daemon: InferenceJobDaemon) -> None:
    for job in daemon.running_jobs:
        if job.get('Tunnels'):
            pass
            # do something with the live inference server

Then add it to the daemon instance:

daemon = InferenceJobDaemon(client, config=config)
daemon.add_routine(my_special_routine)