Adding Observability to the Infrastructure
Level 3 – Monitoring, Logging, and Operational Visibility
In the previous articles, we built a progressively more realistic infrastructure.
First, in Level 1, we deployed a simple WordPress stack on a single virtual machine.
Then, in Level 2, we evolved the architecture toward a multi-server infrastructure with:
-
a dedicated reverse proxy VM (Caddy)
-
multiple WordPress backend VMs
-
dynamic routing managed by Ansible
At this stage, the infrastructure is functional and scalable.
However, an essential aspect of real-world systems is still missing:
observability.
In Level 3, we add monitoring and logging capabilities to understand what happens inside the system.
Why Observability Matters
In small personal projects, it is common to deploy applications and assume everything will work.
In production environments, this approach quickly becomes problematic.
Engineers must be able to answer questions like:
-
Is the server running correctly?
-
Is the reverse proxy receiving traffic?
-
Are users encountering errors?
-
Is the system running out of resources?
Without proper monitoring and logs, diagnosing problems becomes extremely difficult.
Observability provides the tools to answer these questions.
In this project, we introduce three core components:
-
Prometheus – metrics collection
-
Grafana – metrics visualization
-
Promtail + Loki – centralized logging
Level 3 Architecture
With the addition of observability, the architecture becomes slightly more complex.
Internet
│
▼
VM-PROXY (Caddy)
│
┌──────────┴──────────┐
▼ ▼
VM-WP1 VM-WP2
WordPress Site 1 WordPress Site 2
Monitoring Stack
│
▼
VM-MONITORING
Prometheus + Grafana + Loki
A new machine is introduced:
VM-MONITORING
This server runs the monitoring stack and collects data from the infrastructure.
Monitoring with Prometheus
Prometheus is responsible for collecting metrics from servers and services.
Metrics include information such as:
-
CPU usage
-
memory consumption
-
container status
-
network activity
Prometheus works by scraping HTTP endpoints exposed by exporters.
Prometheus Deployment with Docker
In this project, Prometheus is deployed as a Docker container.
Example configuration:
services:
prometheus:
image: prom/prometheus
volumes:
– ./prometheus.yml:/etc/prometheus/prometheus.yml
The prometheus.yml file defines which services should be monitored.
Example:
scrape_configs:
– job_name: caddy
static_configs:
– targets: [‘vm-proxy:2019’]– job_name: wordpress
static_configs:
– targets: [‘10.0.0.11:9100’,‘10.0.0.12:9100’]
Prometheus periodically queries these endpoints to collect metrics.
Visualizing Metrics with Grafana
While Prometheus stores metrics efficiently, it is not designed for rich visual dashboards.
This is where Grafana comes in.
Grafana allows engineers to create dashboards showing:
-
CPU usage over time
-
network traffic
-
container health
-
service availability
Example Docker service:
grafana:
image: grafana/grafana
ports:
– “3000:3000”
Once running, Grafana connects to Prometheus as a data source.
From there, dashboards can be created to visualize system behavior.
Centralized Logging with Loki
Metrics alone are not enough to understand system behavior.
Logs provide detailed information about application activity.
For example:
-
HTTP requests handled by Caddy
-
WordPress application logs
-
container logs
Instead of reading logs manually on each server, we centralize them using Loki.
Log Collection with Promtail
Promtail is a lightweight agent that collects logs and sends them to Loki.
In this infrastructure, Promtail reads log files from the reverse proxy.
Example configuration:
scrape_configs:
– job_name: caddy_logs
static_configs:
– targets: [localhost] labels:
job: caddy
host: vm-proxy
Promtail continuously scans log files and pushes them to Loki.
Log Exploration in Grafana
Once logs are stored in Loki, they can be explored directly from Grafana.
This makes it possible to search logs by:
-
time range
-
server
-
service
-
log content
For example, we can quickly identify:
-
HTTP errors
-
spikes in traffic
-
suspicious requests
This dramatically improves troubleshooting capabilities.
Automating the Monitoring Stack with Ansible
Just like the application infrastructure, the monitoring stack is deployed using Ansible.
A dedicated role can be used:
roles/
└── monitoring_stack
This role typically includes:
roles/monitoring_stack
├── tasks
│ └── main.yml
├── templates
│ └── prometheus.yml.j2
The tasks handle:
-
creating the monitoring directory
-
deploying the Docker Compose configuration
-
generating Prometheus configuration files
Example task:
– name: Deploy monitoring stack
copy:
src: compose.yml
dest: /opt/monitoring/compose.yml
After deployment, the monitoring stack runs automatically.
Benefits of Level 3
Adding observability transforms the project into a real DevOps infrastructure.
Key improvements include:
Operational visibility
Engineers can see what is happening inside the system.
Faster troubleshooting
Logs and metrics make debugging significantly easier.
Infrastructure monitoring
Performance issues can be detected early.
Production readiness
Monitoring and logging are essential in any serious environment.