Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
# .env is gitignored and must NEVER be committed.

# ── Elasticsearch ────────────────────────────────────────────────────────────
# Optional Elastic Stack image tag for local Docker Compose.
ELK_VERSION=9.4.2

# Password for the built-in 'elastic' superuser.
# Must be at least 6 characters. No default is provided — the stack will
# refuse to start without it.
Expand Down
4 changes: 2 additions & 2 deletions .env.poc
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

# ── Elastic Stack ─────────────────────────────────────────────
# Latest Elastic Stack — drives every Elastic image in the compose file.
ELK_VERSION=9.4.0
ELK_VERSION=9.4.2
ELASTIC_PASSWORD=dataobs_poc_elastic
KIBANA_PASSWORD=dataobs_poc_kibana

Expand All @@ -48,7 +48,7 @@ APM_SERVER_URL=elastic-agent:8200
# The pipeline bulk-indexes pipeline records and dataobs-* observability
# documents straight into Elasticsearch. APM telemetry follows the APM
# Server route above.
# Single-node POC cluster (discovery.type=single-node, ES 9.4.0).
# Single-node POC cluster (discovery.type=single-node, ES 9.4.2).
# Production / AWS reference architecture is multi-node with transport TLS;
# see docs/poc-setup.md for the upgrade path.
ELASTICSEARCH_URL=http://es01:9200
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,13 +281,13 @@ See [`integrations/grafana-alloy/docs/GUIDE.md`](integrations/grafana-alloy/docs

```
fatal exception while booting Elasticsearch
error.message: cannot upgrade a node from version [8.13.0] directly to version [9.3.0],
error.message: cannot upgrade a node from version [8.13.0] directly to version [9.x],
upgrade to version [8.19.0] first.
```

**Root cause**

Elasticsearch stores its originating version in node metadata inside the `esdata` Docker volume. When you previously ran the stack with an older image (e.g. `8.13.0`) and later pulled `9.3.0`, the new process reads the stale metadata and hard-blocks the start because Elastic enforces a **mandatory stepping-stone upgrade path**: you cannot skip directly from 8.x to 9.x — you must first pass through the last minor release of 8.x (`8.19.0`). Since this is a local POC with no production data, the simplest fix is to delete the stale volume.
Elasticsearch stores its originating version in node metadata inside the `esdata` Docker volume. When you previously ran the stack with an older image (e.g. `8.13.0`) and later pulled the current 9.x POC image, the new process reads the stale metadata and hard-blocks the start because Elastic enforces a **mandatory stepping-stone upgrade path**: you cannot skip directly from 8.x to 9.x — you must first pass through the last minor release of 8.x (`8.19.0`). Since this is a local POC with no production data, the simplest fix is to delete the stale volume.

**Fix — delete the stale `esdata` volume**

Expand Down Expand Up @@ -317,7 +317,7 @@ docker logs dataobs-es01 -f

# Confirm the running version
curl -s -u elastic:<your-password> http://localhost:9200 | jq .version.number
# Expected output: "9.3.0"
# Expected output: "9.4.2"
```

Kibana will be available at `http://localhost:5601` once the `es-setup` init container completes its one-shot password bootstrap and the `service_completed_successfully` health gate opens for the Kibana service.
Expand Down
2 changes: 1 addition & 1 deletion config/otel-collector-poc.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ============================================================
# OpenTelemetry Collector — POC Configuration (macOS-safe)
# otel-collector-contrib:0.99.0 / Elastic Stack 9.4.0
# otel-collector-contrib:0.99.0 / Elastic Stack 9.4.2
#
# Notes on Elastic 9.4 + EDOT (Elastic Distribution of OpenTelemetry):
# - From Elastic 9.2 the Elastic Agent ships an embedded EDOT collector
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.poc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# macOS-safe runtime profile for Docker Desktop
#
# FIX-SUMMARY (fix/poc-elastic-otel-data-ingest):
# 1. Elastic stack bumped to 9.4.2 (latest); single ELK_VERSION variable
# 1. Elastic stack bumped to 9.4.2; single ELK_VERSION variable
# drives every Elastic image so future bumps are one-line changes.
# 2. Single-node Elasticsearch (service `es01`, discovery.type=single-node)
# for local POC simplicity. ES 9.x requires transport TLS once the
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ services:
# BACKEND A — Elasticsearch 9.x AIOps (default profile)
# ════════════════════════════════════════════════════════════════════════════
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:9.4.2
image: docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION:-9.4.2}
container_name: dataobs-es01
environment:
- node.name=dataobs-es01
Expand Down Expand Up @@ -110,7 +110,7 @@ services:
restart: "no"

kibana:
image: docker.elastic.co/kibana/kibana:9.4.2
image: docker.elastic.co/kibana/kibana:${ELK_VERSION:-9.4.2}
container_name: dataobs-kibana
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
Expand Down
46 changes: 39 additions & 7 deletions docs/poc-setup.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DataObs POC — Setup & Run Guide

> **Elastic Stack:** 9.4.0 (latest stable, April 2026)
> **Elastic Stack:** 9.4.2
> **APM Server:** Built into Elasticsearch 9.x via Elastic Agent — **no separate `apm-server` container**.

This guide gets DataObs running end-to-end locally in **under 10 minutes** using three dedicated files that don't touch the main stack config at all:
Expand Down Expand Up @@ -66,6 +66,24 @@ No Python, Java, or Spark installation needed — everything runs inside Docker.

---

### Validate Elastic image availability

Before changing `ELK_VERSION` or troubleshooting a failed pull, verify that
the matching Elasticsearch, Kibana, Fleet Server, and Elastic Agent images
exist in Elastic's registry:

```bash
docker manifest inspect docker.elastic.co/elasticsearch/elasticsearch:9.4.2
docker manifest inspect docker.elastic.co/kibana/kibana:9.4.2
docker manifest inspect docker.elastic.co/elastic-agent/elastic-agent:9.4.2
```

If any manifest check fails, stop and resolve the image/tag availability issue
before editing compose files or starting the POC. Do not mix Elastic Stack
versions across Elasticsearch, Kibana, Fleet Server, and Elastic Agent.

---

## Step 1 — Use the committed env file

The repo ships `.env.poc` with safe POC defaults — every command in this
Expand Down Expand Up @@ -96,7 +114,7 @@ docker compose -f docker-compose.poc.yml --env-file .env.poc.local up -d
## Step 2 — Start the infrastructure

The POC boots a **single-node Elasticsearch** instance (`es01`) running ES
9.4.0 with `discovery.type=single-node`. Single-node mode skips the cluster
9.4.2 with `discovery.type=single-node`. Single-node mode skips the cluster
bootstrap check that — on ES 9.x with security enabled — would otherwise
require transport TLS / certificate setup, which is overkill for a local
POC. Pipeline, Kibana and Fleet all write to this one node.
Expand All @@ -123,9 +141,9 @@ es01 (single-node ES, discovery.type=single-node)
└─► otel-collector
```

> **Upgrade gotcha:** if you previously ran the POC on Elastic 8.x, 9.3
> or the multi-node 9.4 variant, the old `es*_data` Docker volumes are
> incompatible. Wipe them with
> **Upgrade gotcha:** if you previously ran the POC on Elastic 8.x, an earlier 9.x
> image, or the multi-node local variant, the old `es*_data` Docker
> volumes can be incompatible. Wipe them with
> `docker compose -f docker-compose.poc.yml down -v` before starting.

Wait until all services show **healthy** (~60–90 s on first pull):
Expand All @@ -136,6 +154,20 @@ docker compose -f docker-compose.poc.yml --env-file .env.poc ps

---

### Upgrading local POC volumes from 9.4.0 to 9.4.2

Users moving an existing local POC from Elastic Stack 9.4.0 to 9.4.2 should
reset the local demo volumes, pull the refreshed images, and then start the
stack again:

```bash
./scripts/demo_reset.sh
docker compose -f docker-compose.poc.yml --env-file .env.poc pull
./scripts/demo_up.sh
```

---

## Step 3 — Run the pipeline

```bash
Expand Down Expand Up @@ -295,8 +327,8 @@ docker compose -f docker-compose.poc.yml --env-file .env.poc.local \

| Service | RAM (approx) |
|---|---|
| Elasticsearch 9.4.0 | 1.5 GB |
| Kibana 9.4.0 | 1 GB |
| Elasticsearch 9.4.2 | 1.5 GB |
| Kibana 9.4.2 | 1 GB |
| Fleet Server | 256 MB |
| Elastic Agent (APM) | 512 MB |
| OTel Collector | 256 MB |
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# DataObs Python dependencies
# Pin to minor versions; update deliberately.

# Elasticsearch client — 9.x series (matches ES 9.3 cluster)
# Elasticsearch client — 9.x series (compatible with ES 9.4.2 POC cluster)
# The 9.x client drops the deprecated body= parameter entirely.
# All ES API calls in this codebase already use keyword arguments
# (fixed in PR #20) so this upgrade is a straight pin bump.
Expand Down
52 changes: 52 additions & 0 deletions tests/test_poc_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,55 @@ def _bad_scenario(*_args, **_kwargs):
runner = DataObsPipelineRunner()
with pytest.raises(RuntimeError, match='missing required output indices'):
runner.run()


def test_poc_elastic_stack_version_pins_are_consistent():
repo = Path(__file__).resolve().parents[1]
env_text = (repo / '.env.poc').read_text()
compose_text = (repo / 'docker-compose.poc.yml').read_text()
root_compose_text = (repo / 'docker-compose.yml').read_text()
example_env_text = (repo / '.env.example').read_text()

assert 'ELK_VERSION=9.4.2' in env_text
assert 'ELK_VERSION=9.4.2' in example_env_text
assert 'docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION:-9.4.2}' in compose_text
assert 'docker.elastic.co/kibana/kibana:${ELK_VERSION:-9.4.2}' in compose_text
assert 'docker.elastic.co/elastic-agent/elastic-agent:${ELK_VERSION:-9.4.2}' in compose_text
assert 'ELK_VERSION: ${ELK_VERSION:-9.4.2}' in compose_text
assert 'docker.elastic.co/elasticsearch/elasticsearch:${ELK_VERSION:-9.4.2}' in root_compose_text
assert 'docker.elastic.co/kibana/kibana:${ELK_VERSION:-9.4.2}' in root_compose_text


def test_poc_elastic_stack_has_no_stale_active_version_references():
repo = Path(__file__).resolve().parents[1]
allowed_history = {
'docs/poc-setup.md',
}
stale_tokens = ('9.' + '4.0', '9.' + '3.0', '9.' + '3.3')
offenders = []

for path in repo.rglob('*'):
if path.is_dir() or '.git' in path.parts or path.suffix in {'.pyc', '.pyo'}:
continue
try:
text = path.read_text(errors='ignore')
Comment on lines +96 to +100

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit stale-version scan to tracked files

This repo-wide rglob also reads gitignored/untracked files, so the test fails for a documented local workflow if a developer has an older .env.poc.local with ELK_VERSION=9.4.0 (or any local notes/cache/venv file containing the stale tokens), even though the committed files are correct. Since .env.poc.local is explicitly gitignored and called out in the POC docs as an optional local override, this check should restrict itself to tracked files or exclude ignored paths.

Useful? React with 👍 / 👎.

except OSError:
continue
rel = path.relative_to(repo).as_posix()
if rel in allowed_history:
continue
if any(token in text for token in stale_tokens):
offenders.append(rel)

assert offenders == []


def test_fleet_package_policies_resolve_semver_versions_not_latest_literals():
repo = Path(__file__).resolve().parents[1]
compose_text = (repo / 'docker-compose.poc.yml').read_text()

assert 'APM_VER=$$(' in compose_text
assert 'DOCKER_VER=$$(' in compose_text
assert r'\"version\":\"$${APM_VER}\"' in compose_text
assert r'\"version\":\"$${DOCKER_VER}\"' in compose_text
assert r'\"version\":\"latest\"' not in compose_text
Loading