OLLAMA SERVE_V0.5.7
System Operational

Cluster Status

LOCAL_NODE

Real-time monitoring of local inference instances and hardware telemetry.

Active Instances
PID: 8824
INFERENCE
deepseek-r1:671b
Q4_K_M
CTX: 128K
VRAM: 382GB
PID: 4102
IDLE
llama-3.3:70b
FP16
VRAM: 38GB
PID: --
STOPPED
mistral-large:latest
Q4_0
VRAM: 0GB
Throughput
2,405
+12.5%
Avg Latency
42ms
Token generation
VRAM Usage
8x H100 SXM5
ALLOCATED 404.8 GB / 640 GB
Temp
67°C
Power
580W
Fan
45%
api_test.sh
~ curl -X POST /api/generate
// Response stream
{"model":"deepseek-r1",
"response":"Thinking Process:\n1. Analyze user request\n2. Retrieve context..."}

System Status

Detailed health check of cluster components.

All Systems Operational

Last check: Just now

v0.5.7
Inference Engine
OPERATIONAL
API Gateway
OPERATIONAL
Model Registry
OPERATIONAL
GPU Bus Link
OPERATIONAL

API Documentation

Endpoints for interacting with the inference engine.

POST /api/generate

Generate a response for a given prompt.

Request Body application/json
{
  "model": "deepseek-r1",
  "prompt": "Why is the sky blue?",
  "stream": false
}
POST /api/chat

Generate the next message in a chat conversation.

Request Body application/json
{
  "model": "llama-3.3",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}

Privacy Policy

How we handle your data.

Data stays local

This instance of Ollama is running entirely on your local infrastructure. No prompts, generated text, or embedding data is sent to Ollama Inc. or any third-party cloud services. All inference happens on your hardware (PID: 8824).

Telemetry

We collect anonymous usage statistics (such as token throughput and error rates) to help improve performance. This can be disabled by setting OLLAMA_NOPRUNE=1 in your environment variables.