Cluster Status

LOCAL_NODE

Real-time monitoring of local inference instances and hardware telemetry.

Active Instances

PID: 8824

INFERENCE

deepseek-r1:671b

Q4_K_M

CTX: 128K

VRAM: 382GB

PID: 4102

IDLE

llama-3.3:70b

FP16

VRAM: 38GB

PID: --

STOPPED

mistral-large:latest

Q4_0

VRAM: 0GB

Throughput

2,405

+12.5%

Avg Latency

42ms

Token generation

VRAM Usage

8x H100 SXM5

ALLOCATED 404.8 GB / 640 GB

Temp

67°C

Power

580W

Fan

45%

api_test.sh

➜ ~ curl -X POST /api/generate

// Response stream

{"model":"deepseek-r1",
"response":"Thinking Process:\n1. Analyze user request\n2. Retrieve context..."}

System Status

Detailed health check of cluster components.

All Systems Operational

Last check: Just now

v0.5.7

Inference Engine

OPERATIONAL

API Gateway

OPERATIONAL

Model Registry

OPERATIONAL

GPU Bus Link

OPERATIONAL

API Documentation

Endpoints for interacting with the inference engine.

POST /api/generate

Generate a response for a given prompt.

Request Body application/json

{
  "model": "deepseek-r1",
  "prompt": "Why is the sky blue?",
  "stream": false
}

POST /api/chat

Generate the next message in a chat conversation.

Request Body application/json

{
  "model": "llama-3.3",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}

Privacy Policy

How we handle your data.

Data stays local

This instance of Ollama is running entirely on your local infrastructure. No prompts, generated text, or embedding data is sent to Ollama Inc. or any third-party cloud services. All inference happens on your hardware (PID: 8824).

Telemetry

We collect anonymous usage statistics (such as token throughput and error rates) to help improve performance. This can be disabled by setting OLLAMA_NOPRUNE=1 in your environment variables.