Adding Observability to the Infrastructure

Level 3 – Monitoring, Logging, and Operational Visibility

In the previous articles, we built a progressively more realistic infrastructure.

First, in Level 1, we deployed a simple WordPress stack on a single virtual machine.

Then, in Level 2, we evolved the architecture toward a multi-server infrastructure with:

  • a dedicated reverse proxy VM (Caddy)

  • multiple WordPress backend VMs

  • dynamic routing managed by Ansible

At this stage, the infrastructure is functional and scalable.
However, an essential aspect of real-world systems is still missing:

observability.

In Level 3, we add monitoring and logging capabilities to understand what happens inside the system.

Why Observability Matters

In small personal projects, it is common to deploy applications and assume everything will work.

In production environments, this approach quickly becomes problematic.

Engineers must be able to answer questions like:

  • Is the server running correctly?

  • Is the reverse proxy receiving traffic?

  • Are users encountering errors?

  • Is the system running out of resources?

Without proper monitoring and logs, diagnosing problems becomes extremely difficult.

Observability provides the tools to answer these questions.

In this project, we introduce three core components:

  • Prometheus – metrics collection

  • Grafana – metrics visualization

  • Promtail + Loki – centralized logging

Level 3 Architecture

With the addition of observability, the architecture becomes slightly more complex.

Internet


VM-PROXY (Caddy)

┌──────────┴──────────┐
▼ ▼
VM-WP1 VM-WP2
WordPress Site 1 WordPress Site 2

Monitoring Stack


VM-MONITORING
Prometheus + Grafana + Loki

A new machine is introduced:

VM-MONITORING

This server runs the monitoring stack and collects data from the infrastructure.

Monitoring with Prometheus

Prometheus is responsible for collecting metrics from servers and services.

Metrics include information such as:

  • CPU usage

  • memory consumption

  • container status

  • network activity

Prometheus works by scraping HTTP endpoints exposed by exporters.

Prometheus Deployment with Docker

In this project, Prometheus is deployed as a Docker container.

Example configuration:

services:
prometheus:
image: prom/prometheus
volumes:
– ./prometheus.yml:/etc/prometheus/prometheus.yml

The prometheus.yml file defines which services should be monitored.

Example:

scrape_configs:

– job_name: caddy
static_configs:
– targets: [‘vm-proxy:2019’]

– job_name: wordpress
static_configs:
– targets: [‘10.0.0.11:9100’,‘10.0.0.12:9100’]

Prometheus periodically queries these endpoints to collect metrics.

Visualizing Metrics with Grafana

While Prometheus stores metrics efficiently, it is not designed for rich visual dashboards.

This is where Grafana comes in.

Grafana allows engineers to create dashboards showing:

  • CPU usage over time

  • network traffic

  • container health

  • service availability

Example Docker service:

grafana:
image: grafana/grafana
ports:
– “3000:3000”

Once running, Grafana connects to Prometheus as a data source.

From there, dashboards can be created to visualize system behavior.

Centralized Logging with Loki

Metrics alone are not enough to understand system behavior.

Logs provide detailed information about application activity.

For example:

  • HTTP requests handled by Caddy

  • WordPress application logs

  • container logs

Instead of reading logs manually on each server, we centralize them using Loki.

Log Collection with Promtail

Promtail is a lightweight agent that collects logs and sends them to Loki.

In this infrastructure, Promtail reads log files from the reverse proxy.

Example configuration:

scrape_configs:

– job_name: caddy_logs
static_configs:
– targets: [localhost] labels:
job: caddy
host: vm-proxy

Promtail continuously scans log files and pushes them to Loki.

Log Exploration in Grafana

Once logs are stored in Loki, they can be explored directly from Grafana.

This makes it possible to search logs by:

  • time range

  • server

  • service

  • log content

For example, we can quickly identify:

  • HTTP errors

  • spikes in traffic

  • suspicious requests

This dramatically improves troubleshooting capabilities.

Automating the Monitoring Stack with Ansible

Just like the application infrastructure, the monitoring stack is deployed using Ansible.

A dedicated role can be used:

roles/
└── monitoring_stack

This role typically includes:

roles/monitoring_stack
├── tasks
│ └── main.yml
├── templates
│ └── prometheus.yml.j2

The tasks handle:

  • creating the monitoring directory

  • deploying the Docker Compose configuration

  • generating Prometheus configuration files

Example task:

– name: Deploy monitoring stack
copy:
src: compose.yml
dest: /opt/monitoring/compose.yml

After deployment, the monitoring stack runs automatically.

Benefits of Level 3

Adding observability transforms the project into a real DevOps infrastructure.

Key improvements include:

Operational visibility

Engineers can see what is happening inside the system.

Faster troubleshooting

Logs and metrics make debugging significantly easier.

Infrastructure monitoring

Performance issues can be detected early.

Production readiness

Monitoring and logging are essential in any serious environment.