Cluster Status
LOCAL_NODEReal-time monitoring of local inference instances and hardware telemetry.
Active Instances
PID: 8824
INFERENCE
deepseek-r1:671b
Q4_K_M
CTX: 128K
PID: 4102
IDLE
llama-3.3:70b
FP16
PID: --
STOPPED
mistral-large:latest
Q4_0
Throughput
2,405
+12.5%
Avg Latency
42ms
Token generation
VRAM Usage
8x H100 SXM5
ALLOCATED
404.8 GB / 640 GB
Temp
67°C
Power
580W
Fan
45%
api_test.sh
➜ ~ curl -X POST /api/generate
// Response stream
{"model":"deepseek-r1",
"response":"Thinking Process:\n1. Analyze user request\n2. Retrieve context..."}
"response":"Thinking Process:\n1. Analyze user request\n2. Retrieve context..."}