Freelens Extension  ·  v0.4.0  ·  MIT License

Talk to your Kubernetes cluster.
Without leaving Freelens.

An AI-powered SRE assistant embedded directly in Freelens. It sees your live cluster state — pods, deployments, nodes, events — and adapts its answers to what you're actually asking. Works with Ollama, LocalAI, OpenAI, or any OpenAI-compatible endpoint. Just paste your URL and go — provider auto-detected.

Ollama & OpenAI-compatible Kubernetes-native Tool Calling Streaming Human-in-the-Loop
K8s SRE Assistant chat UI inside Freelens
~85% token reduction vs v0.1.0 on large clusters
9 built-in Kubernetes inspection tools
6 SRE workflow modes with intent auto-detection
4–9B parameter models fully supported

Everything an SRE needs, built in

Designed to work well on small (4–9B) models against real-world mid-size clusters.

Conversational Chat Interface

Ask SRE questions in plain English. Responses stream in with full Markdown rendering — tables, code blocks, and lists render correctly even mid-stream. Session history persisted per cluster and namespace.

Live Cluster Awareness

The model sees your cluster in real time via direct KubeApi.list() calls — not a stale cache. Pods, deployments, services, nodes, warning events, replica mismatches. Click Refresh to force a new scan.

Engine Agnostic — Any OpenAI-compatible Backend

Works with Ollama, LocalAI, LM Studio, OpenAI, or any /v1/chat/completions endpoint. All API calls use Node.js http/https — no browser mixed-content issues. Provider is auto-detected from the URL; optional API key for endpoints that require auth.

Tool Calling with Human-in-the-Loop

The assistant drills into specific resources on demand during a conversation. Every tool call shows an Approve / Deny card — colour-coded by sensitivity. You stay in control of what the model is allowed to inspect.

Smart Context Pipeline

ChunkManager → BM25 Retriever → SummaryManager → ContextBuilder. Only the 15 most query-relevant pods/deployments are injected per message, plus all anomalous resources — so even large clusters fit small models.

Intent-Aware Response Format

Every query is classified as write, investigate, explain, or general. Response structure adapts: full Evidence → Correlation → Hypotheses for investigations, direct manifest for YAML requests, clean prose for explanations.

Canvas Relationship Graphs

Dependency diagrams render as native Canvas — zero npm dependencies, no renderer crashes. K8s colour-coded nodes, bezier edges, BFS layout. Expand to full screen or download as PNG.

One-Click Workload Analysis

Robot-icon button injected into every workload's toolbar, context menu, and detail drawer. Clicking opens a 640 px floating side panel with an analysis already running — no page navigation, no typing needed.

What can you ask it?

Example queries across the six SRE domains it understands natively.

CategoryExample queries
Cluster health "What's the overall health of my cluster?"  ·  "Are there any pods in CrashLoopBackOff?"
Troubleshooting "Why is my deployment not rolling out?"  ·  "Help me debug this crashing pod"
YAML authoring "Write a simple nginx deployment"  ·  "Add a liveness probe to this deployment"
Optimization "Are there pods without resource limits?"  ·  "Suggest HPA configs for my deployments"
Security "Check for pods running as root"  ·  "Review my RBAC configuration"
Operations "How do I scale this deployment?"  ·  "Generate a NetworkPolicy for namespace isolation"

Format adapts to your intent

Every query is classified automatically. The correlated signals block (warning events, crash pods, replica mismatches) is injected only for investigative queries — skipped entirely for write, explain, and general.

IntentTriggered byResponse format
write "write a deployment", "give me a YAML", "create a…" Direct manifest + one-line RISK rating + verification step
investigate "why is…", "debug…", "crashloop", "not working" Evidence → Correlation → Hypotheses → Checks → Actions
explain "what is…", "how does…", "explain…" Clean prose explanation
general Everything else Concise direct answer

Nine inspection tools, full control

The assistant inspects cluster resources on-demand during a conversation. All tools require explicit Approve / Deny before execution. Each tool can be individually toggled in Preferences.

Inspect
get_namespace_detail Full pod / deployment / service list for a specific namespace
get_pod_detail Container states, restart counts, exit codes, termination reasons
get_resource_events Recent warning events for any named K8s resource
get_deployment_detail Replica status and pod states for a deployment
get_nodes All cluster nodes with Ready / NotReady status
get_resource_chain Full upstream / downstream graph: owner controller, PVCs, missing Secrets/ConfigMaps, HPA, Ingress chain
List
list_resources Full inventory of any resource kind: pods, deployments, services, nodes, secrets, configmaps, ingresses, statefulsets, daemonsets, jobs, cronjobs, pvcs
Sensitive
get_pod_logs Last 30 log lines (signal-filtered). Gated behind a dedicated approval requiring the model to state its rationale first.
get_container_logs Container-specific log access with auto-resolved container name from context.

Six workflow presets, one selector

Override intent auto-detection to lock the model into a specific analytical frame.

ModeBehaviour
AutoIntent detected from query text — recommended for general use
TroubleshootAlways full investigation format: Evidence → Correlation → Hypotheses → Checks → Actions
SecurityRBAC, PodSecurity, NetworkPolicy, image vulnerabilities, secret exposure risks
CostWaste reduction, right-sizing recommendations, autoscaling configuration
CapacitySaturation signals, scheduling pressure, scaling strategy, resource headroom
YAMLDirect manifest output — no analysis preamble, no investigation sections

Token-efficient on large clusters

Context pipeline designed to avoid "lost-in-the-middle" failures on small models. Non-blocking summarisation adds zero latency — compression runs after the response is delivered.

v0.1.0v0.2.0+
System prompt cluster section (180 pods) ~6,500 tokens ~1,000 tokens
Reduction ~85%
Pods injected per message All 15 most relevant + all anomalies
History compression None After 20 pairs → background summary
Added latency from compression Zero (post-response)
Anomaly sorting None CrashLoop / OOMKilled / NotReady floated to top

Up and running in four steps

1

Start your AI backend

Option A — Ollama (local)

curl -fsSL https://ollama.com/install.sh | sh
ollama serve
ollama pull qwen2.5:7b   # recommended

Option B — OpenAI-compatible endpoint (LocalAI, LM Studio, OpenAI, etc.)

# LocalAI example
docker run -p 8080:8080 localai/localai:latest

# Or use any hosted endpoint:
# https://api.openai.com  (API key required)

Models smaller than 4B parameters are not reliably supported. Tool-calling and multi-step reasoning require sufficient model capacity.

2

Build the extension

git clone https://github.com/b-iurea/freelens-ollama-extension.git
cd freelens-ollama-extension
pnpm install && pnpm build && pnpm pack
3

Load into Freelens

Open Freelens → ExtensionsAdd Local Extension → select the generated .tgz file.

The K8s SRE Assistant entry will appear in the left cluster sidebar. Click it to open the chat, open the Connection Panel, paste your endpoint URL, and optionally enter an API key. Provider is auto-detected.

Requirements
  • Freelens ≥ 1.4.0
  • Ollama running locally or on the network, or any OpenAI-compatible AI backend (LocalAI, LM Studio, OpenAI, etc.)
  • At least one model available on your endpoint (7B+ recommended for tool-calling workflows)
  • Node.js / pnpm (build only — not required at runtime)