A Casual Look at Prometheus

If you’re in the Kubernetes or cloud-native world, you can’t throw a stone without hitting someone who is using Prometheus for monitoring. It’s the de facto standard, the default, the “just install it already” tool that everyone seems to rely on.

So, what is it? In simple terms, Prometheus is an open-source monitoring toolkit that was built for the chaotic, dynamic world of microservices. It’s not like old-school monitoring tools that wait for you to push data to them. Oh no. Prometheus is aggressive—it pulls (or “scrapes”) metrics from your apps, stores them in its own time-series database, and then lets you query them with a special language called PromQL.

It’s the beating heart of observability for many, many setups. But like any superstar, it has its quirks.

My Lab Configuration

For this test, I used the following: Physical 2 x Raspberry Pi5 2 x Raspberry Pi4 MSI NUC Mini PC 32 Cores 32GB RAM and 250GB NVME Router Dedicated 1GBps Network Switch

Logical 4 (2VMs) Master\Control Plane Nodes -running ubuntu with k3s 4 (2VMs) Worker Nodes -running ubuntu with k3s Proxmox K8s Cluster

Workloads/Services Cert Manager Nginx Prometheus ArgoCD

The Good Stuff (The Advantages)

There’s a reason it’s on top of the monitoring mountain.

The Pull Model is Smart: Instead of every single app needing to know where the monitoring server is, Prometheus just goes out and scrapes /metrics endpoints. This is great for service discovery. A new pod spins up, Kubernetes tells Prometheus about it, and bam, it’s being scraped. No-fuss.

PromQL is (Eventually) Awesome: PromQL is the query language you use to actually look at your data. It has a learning curve (more on that later), but it’s incredibly powerful for slicing, dicing, and aggregating time-series data. Calculating a 5-minute error rate across your entire fleet? That’s a one-liner.

Kube-Native King: It was the second project to join the CNCF (right after Kubernetes). It’s built for this stuff. It understands services, pods, and labels at a deep level, making it a natural fit for monitoring your cluster.

The Ecosystem is Massive: You’re never alone with Prometheus. Need to get metrics from a database? There’s an exporter for that. Need to get metrics from your network switch? There’s probably an exporter for that, too.

Alerting is Built-In: The included Alertmanager is solid. It handles all your alerting logic, can group alerts (so you don’t get 1,000 “pod is down” messages), and can route them to Slack, PagerDuty, email, or whatever.

The Not-So-Good Stuff (The Drawbacks)

This is what people are usually complaining about on Slack at 2 AM.

That PromQL Learning Curve: I said it was powerful, not easy. Your first day with PromQL will be confusing. You’ll be wondering why your query returns nothing, or why rate() is so weird. It takes a bit to “click,” and it can be intimidating.

“High Cardinality” is a Cursed Phrase: This is the big one. Prometheus stores data using labels (like app=”my-api”, pod=”api-1234”). If you create labels with unique values—like a user ID or a full request-ID—you’ve just created “high cardinality.” This can destroy your Prometheus server, blowing up its memory usage and grinding it to a halt. You have to be careful about what you use as a label.

Scaling and Long-Term Storage: Out of the box, Prometheus is a single, stateful server. It stores data on its own disk. This is simple, but what about high availability? Or storing data for more than a few weeks? This is where it gets complicated. The official answer is “federation” (having a Prometheus scrape other Prometheuses), but the real answer for most people is a third-party tool like Thanos or Cortex, which adds a whole new layer of complexity.

It’s Just Metrics: Prometheus is a world-class metrics solution. It is not a logging solution (like ELK) or a tracing solution (like Jaeger). You’ll need other tools for that, which means you’re building an “observability stack,” not just installing one tool.

Verdict

So, is Prometheus worth the hype and the occasional headache?

Yeah, absolutely.

For 90% of monitoring use cases, especially in a Kubernetes environment, it’s the right choice. It’s powerful, it’s open-source, and the community is huge.

The “drawbacks” are really just the challenges of scaling. You’ll hit the high cardinality problem eventually. You’ll eventually wonder about long-term storage. But by the time you do, you’ll be in a good position to solve those problems, and you’ll understand why you need to.

Rating

5/5 - Just install it already

References

Cert Manager Documentation: https://cert-manager.io/docs/
Chris Lempa Youtube: https://www.youtube.com/watch?v=2cbniIZUpXM
CNCF: https://landscape.cncf.io/?item=provisioning–security-compliance–cert-manager