🔭 OpenSearch Observability Stack

Observability Stack is an open-source stack designed for modern distributed systems. Built on OpenTelemetry, OpenSearch, and Prometheus, Observability Stack provides a complete, pre-configured infrastructure for monitoring microservices, web applications, and AI agents—with first-class support for agent observability through OpenTelemetry Gen-AI Semantic Conventions.

Components

OpenTelemetry Collector: Receives OTLP data and routes it to appropriate backends
Data Prepper: Transforms and enriches logs and traces before storage
OpenSearch: Stores and indexes logs and traces for search and analysis
Prometheus: Stores time-series metrics data
OpenSearch Dashboards: Provides web-based visualization and exploration

🚀 Quickstart

Option 1: One-Command Install (Recommended)

Use our interactive installer for the best experience:

curl -fsSL https://raw.githubusercontent.com/opensearch-project/observability-stack/main/install.sh | bash

The installer will:

✅ Check system requirements (Docker/Finch, Git, memory)
🎨 Guide you through configuration with a beautiful TUI
📦 Pull and start all services automatically
🔐 Optionally set custom OpenSearch credentials
📊 Display credentials and access points

Installer flags:

Flag	Description
`--simulate`	Preview the installer output without actually installing
`--skip-pull`	Skip pulling container images (uses cached images)
`--help`	Show help message

To run the installer locally (e.g. after cloning):

./install.sh                # Full install
./install.sh --simulate     # Dry run
./install.sh --skip-pull    # Skip image pulls (useful for re-installs)

Installation takes 8-15 minutes. After completion, access:

Service	URL	Credentials
OpenSearch Dashboards	http://localhost:5601	admin / My_password_123!@#
Prometheus	http://localhost:9090	(none)
OpenSearch API	https://localhost:9200	admin / My_password_123!@#

Option 2: Manual Setup

To get started manually with Docker Compose:

1️⃣ Clone the repository:

git clone https://github.com/opensearch-project/observability-stack.git
cd observability-stack

Optional: Configure stack

The .env file contains all configurable parameters:

Example services: Included by default via INCLUDE_COMPOSE_EXAMPLES=docker-compose.examples.yml. Comment out to run only the core stack.
OpenTelemetry Demo: Not enabled by default. Uncomment INCLUDE_COMPOSE_OTEL_DEMO=docker-compose.otel-demo.yml to add the full OpenTelemetry Demo microservices app for realistic e-commerce telemetry (~2GB additional memory required).

See Configuration section for more details.

2️⃣ Start the stack:

docker compose up -d

This starts all services including example services (multi-agent travel planner, weather-agent, events-agent, and canary) that generate sample telemetry data.

3️⃣ View your Logs and Traces in OpenSearch Dashboards

👉 Navigate to http://localhost:5601
Username and password can be retrieved from .env file:

grep -E '^OPENSEARCH_(USER|PASSWORD)=' .env

Destroying the Stack

To stop the stack while preserving your data:

docker compose down

To stop the stack and remove all data volumes:

docker compose down -v

Instrumenting Your Agent

Observability Stack accepts telemetry data via the OpenTelemetry Protocol (OTLP) and follows the OpenTelemetry Gen-AI Semantic Conventions for standardized attribute naming and structure for AI agents.

OTLP Endpoint Configuration

The OTel Collector exposes two OTLP endpoints — choose the one that matches your SDK's protocol:

Port	Protocol	Endpoint	Used By
4317	gRPC	`http://localhost:4317`	OpenTelemetry SDK (default), most language SDKs
4318	HTTP/protobuf	`http://localhost:4318`	Strands SDK (`setup_otlp_exporter()`), HTTP-based exporters

Configure via environment variables:

# For gRPC-based exporters (OpenTelemetry SDK default)
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"

# For HTTP/protobuf exporters (Strands SDK, OTLP HTTP exporters)
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"

Note: The Strands Agents SDK's StrandsTelemetry().setup_otlp_exporter() uses HTTP/protobuf, which requires port 4318. Using port 4317 with Strands will silently fail. When using OTLPSpanExporter directly with gRPC (as in the examples below), use port 4317.

Example: Manual Instrumentation with OpenTelemetry

For complete example, see examples/plain-agents/weather-agent

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# Configure OTLP exporter (gRPC — port 4317)
tracer_provider = TracerProvider()
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(tracer_provider)

# Create tracer
tracer = trace.get_tracer("my-agent")

# Instrument agent invocation
with tracer.start_as_current_span("invoke_agent") as span:
    span.set_attribute("gen_ai.operation.name", "invoke_agent")
    span.set_attribute("gen_ai.agent.name", "Weather Assistant")
    span.set_attribute("gen_ai.request.model", "gpt-4")
    
    # Your agent logic here
    result = agent.run("What's the weather in Paris?")
    
    span.set_attribute("gen_ai.usage.input_tokens", 150)
    span.set_attribute("gen_ai.usage.output_tokens", 75)

Example: Instrument with StrandsTelemetry

For complete example, see examples/strands/code-assistant

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from strands import Agent
from strands.models import BedrockModel
from strands.telemetry import StrandsTelemetry

# 1. Initialize StrandsTelemetry (auto-instruments with GenAI semantic conventions)
telemetry = StrandsTelemetry()

# 2. Configure OTLP exporter to send traces to your observability stack
#    Note: This example uses gRPC (port 4317) with OTLPSpanExporter directly.
#    If using setup_otlp_exporter() instead, it uses HTTP/protobuf (port 4318).
exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
telemetry.tracer_provider.add_span_processor(BatchSpanProcessor(exporter))

# 3. Use your agent normally - telemetry happens automatically!
agent = Agent(
    system_prompt="You are a helpful assistant",
    model=BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0"),
    tools=[your_tools]
)

# Every agent call is automatically traced with GenAI semantic conventions
agent("What's the weather like?")

Managing Services

Common Commands

# View logs
docker compose logs -f

# View logs for specific service
docker compose logs -f opensearch

# Stop services (keeps data)
docker compose down

# Stop and remove all data
docker compose down -v

# Restart services
docker compose restart

# Restart specific service
docker compose restart opensearch

# Check service status
docker compose ps

Ports

Port	Service	Protocol	Description
4317	OTel Collector	gRPC	OTLP gRPC receiver — used by most OpenTelemetry SDKs
4318	OTel Collector	HTTP	OTLP HTTP receiver — used by Strands SDK, browser-based exporters
5601	OpenSearch Dashboards	HTTP	Web UI for logs, traces, and dashboards
9090	Prometheus	HTTP	Prometheus Web UI and API
9200	OpenSearch	HTTPS	REST API (self-signed cert, use `curl -k`)
21890	Data Prepper	gRPC	Internal OTLP receiver (from OTel Collector)

When example services are enabled:

Port	Service	Description
8000	weather-agent	Weather lookup API with fault injection
8002	events-agent	Local events lookup API
8003	travel-planner	Multi-agent orchestrator

When OpenTelemetry Demo is enabled:

Port	Service	Description
8080	frontend-proxy	Demo telescope web store
8080/loadgen/	load-generator	Load generator dashboard
8080/feature	feature-flag	Feature flag management UI

Configuration

Environment Variables

The .env file contains all configurable parameters. Edit this file before starting the stack to customize your deployment.

Including Example Services

By default, the stack includes example services (multi-agent travel planner, weather-agent, events-agent, and canary) via the INCLUDE_COMPOSE_EXAMPLES variable in .env:

INCLUDE_COMPOSE_EXAMPLES=docker-compose.examples.yml

Example services:

travel-planner (port 8003): Multi-agent orchestrator demonstrating distributed tracing
weather-agent (port 8000): Weather lookup with fault injection
events-agent (port 8002): Local events lookup
canary: Generates test traffic with fault injection

To run without examples:

Comment out the INCLUDE_COMPOSE_EXAMPLES line in .env
Restart the stack: docker compose down && docker compose up -d

Running with OpenTelemetry Demo

Observability Stack can run alongside the OpenTelemetry Demo application, a full microservices e-commerce app that generates realistic telemetry data.

To enable OpenTelemetry Demo, uncomment in .env:

INCLUDE_COMPOSE_OTEL_DEMO=docker-compose.otel-demo.yml

Then restart the stack:

docker compose down && docker compose up -d

Access points when running with OTel Demo:

Web Store: http://localhost:8080/
Load Generator UI: http://localhost:8080/loadgen/
Feature Flags UI: http://localhost:8080/feature

Note: Running with OTel Demo significantly increases resource requirements. See Resource Requirements below.

OpenSearch Credentials

The default credentials are admin / My_password_123!@# (development only).

Setting credentials during install:

The interactive installer prompts "Customize OpenSearch credentials?" — enter Y to set a custom username and password. The installer writes them to .env, and all services pick them up automatically.

Changing credentials after install:

Edit .env file (single source of truth):

OPENSEARCH_USER=your-new-username
OPENSEARCH_PASSWORD=your-new-password

Restart the stack (remove volumes to clear stale credentials):
```
docker compose down -v
docker compose up -d
```

How it works: .env is the single source of truth for credentials. OpenSearch, Dashboards, and the init script read from .env via environment variables. Data Prepper uses a template with OPENSEARCH_USER/OPENSEARCH_PASSWORD placeholders that are injected via sed at container startup — no manual config edits needed. OpenSearch uses HTTPS with self-signed certificates, so use -k flag with curl commands.

Resource Requirements

Configuration	Memory Usage	Recommended Minimum
Core stack only	~1.1 GB	4 GB RAM
Core + OTel Demo	~3.0 GB	8 GB RAM

Core services (~1.1 GB total):

OpenSearch: ~1.6 GB
Data Prepper: ~650 MB
OpenSearch Dashboards: ~230 MB
OTel Collector: ~100 MB
Prometheus: ~40 MB
Example services (weather-agent, canary): ~100 MB

OpenTelemetry Demo adds (~1.9 GB total):

Kafka: ~500 MB
Java services (fraud-detection, ad, accounting): ~540 MB
Frontend, load-generator, and other services: ~850 MB

Check resource usage:

docker stats --no-stream

# For Finch users
finch stats --no-stream

Production Readiness

⚠️ Observability Stack is NOT production-ready out of the box. The default configuration prioritizes ease of use for development and testing. Before deploying to production, you must address the following:

Security Hardening Required

Authentication: Enable OpenSearch security plugin and configure user authentication
Authorization: Implement role-based access control (RBAC)
Encryption: Enable TLS/SSL for all HTTP endpoints and encrypt data at rest
Network Security: Implement network policies, firewalls, and limit exposed ports
Secrets Management: Use secure secret storage instead of default passwords

Operational Requirements

High Availability: Configure multi-node OpenSearch cluster and redundant services
Backup and Recovery: Implement automated backup procedures and test recovery
Monitoring: Set up monitoring and alerting for the observability stack itself
Resource Limits: Configure appropriate CPU, memory, and disk quotas
Data Lifecycle: Implement production-appropriate retention and archival policies

Security Considerations

The default configuration includes these development-friendly settings that are NOT secure:

OpenSearch security is enabled but uses default credentials (admin/My_password_123!@#)
Self-signed TLS certificates with verification disabled
Permissive CORS settings
All services exposed without network isolation

Never deploy the default configuration to production or expose it to untrusted networks.

Troubleshooting

Services Won't Start

Check Docker resource allocation:

docker stats

View service logs:

docker compose logs <service-name>

Data Not Appearing

Verify OpenTelemetry Collector is receiving data:

docker compose logs otel-collector

Check Data Prepper pipeline status:

docker compose logs data-prepper

Verify OpenSearch indices:

curl -k -u admin:My_password_123!@# https://localhost:9200/_cat/indices?v

Performance Issues

Check resource usage:

docker stats

Adjust resource limits in docker-compose.yml or values.yaml for Helm.

Network Removal Error on Shutdown

If docker compose down fails with an error like:

failed to remove network observability-stack-network: Error response from daemon: error while removing network: network observability-stack-network id ab129adaabcd7ab35cddb1fbe8dc2a68b3c730b9fb9384c5c1e7f5ca015c27d9 has active endpoints

This typically occurs when containers from other compose files are still running. Try:

docker compose down --remove-orphans

Or stop all containers using the network first:

docker network inspect observability-stack-network --format '{{range .Containers}}{{.Name}} {{end}}' | xargs -r docker stop
docker compose down

For more troubleshooting guidance, see TROUBLESHOOTING.md.

Project Status: 🚧 Alpha

Note: As the OpenSearch agent observability ecosystem grows, this repository may eventually be consolidated into a unified "container-recipes" repository alongside other quickstart setups. This would provide a centralized location for all OpenSearch deployment patterns. However we'll communicate any such changes through the repository's issue tracker and release notes.

Temporary Workarounds

The current configuration includes a custom OpenSearch Dockerfile (docker-compose/opensearch/Dockerfile) that removes some plugins facing issues during OpenSearch 3.5.0 development. This workaround will be removed once OpenSearch 3.5.0 is officially released and stabilized. At that point, we'll switch back to using the standard OpenSearch Docker image directly.

Track progress: OpenSearch 3.5.0 Release

Documentation

AGENTS.md - AI-optimized repository documentation
CONTRIBUTING.md - Development workflow and contribution guidelines
examples/ - Language-specific instrumentation examples
docs/ - Additional documentation and guides

Development

When making changes to example services or other components, rebuild and restart with:

docker compose up -d --build

This rebuilds any modified containers and restarts them with the new changes.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on:

Development workflow
Testing requirements
Code style conventions
Pull request process

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

Issues: Report bugs or request features via GitHub Issues
Discussions: Ask questions in GitHub Discussions
Documentation: See the docs/ directory

Acknowledgments

Observability Stack is built on top of excellent open-source projects:

Remember: Observability Stack is for development and testing. Harden security and operations before production use.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
docker-compose		docker-compose
docs		docs
examples		examples
.env		.env
.gitignore		.gitignore
.whitesource		.whitesource
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
docker-compose.examples.yml		docker-compose.examples.yml
docker-compose.otel-demo.yml		docker-compose.otel-demo.yml
docker-compose.yml		docker-compose.yml
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

🔭 OpenSearch Observability Stack

Components

🚀 Quickstart

Option 1: One-Command Install (Recommended)

Option 2: Manual Setup

1️⃣ Clone the repository:

Optional: Configure stack

2️⃣ Start the stack:

3️⃣ View your Logs and Traces in OpenSearch Dashboards

Destroying the Stack

Instrumenting Your Agent

OTLP Endpoint Configuration

Example: Manual Instrumentation with OpenTelemetry

Example: Instrument with StrandsTelemetry

Managing Services

Common Commands

Ports

Configuration

Environment Variables

Including Example Services

Running with OpenTelemetry Demo

OpenSearch Credentials

Resource Requirements

Production Readiness

Security Hardening Required

Operational Requirements

Security Considerations

Troubleshooting

Services Won't Start

Data Not Appearing

Performance Issues

Network Removal Error on Shutdown

Project Status: 🚧 Alpha

Temporary Workarounds

Documentation

Development

Contributing

License

Support

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages