Is observability the sum of monitoring, logging, and tracing?

Is observability the sum of monitoring, logging, and tracing?

Table of Contents

Cover image credit: David Clode on Unsplash

Prometheus and Grafana with top votes in CNCF’s End User Technology Radar on Observability comes as no surprise. However, not seeing Litmus or any other chaos engineering tool is a bit surprising because these are listed in the Observability and Analysis category of the CNCF landscape.

Curiosity led me to watch a recording of one of the meetings hosted by the CNCF SIG for Observability. But questions still linger.

If I learn how to use these cool observability tools really well, will I have a firm handle on the state of any cloud-native system? Also, are chaos engineering tools, such as Litmus, ChaosKube, and PowerfulSeal, sufficient to help me determine the resiliency of such a system?

Is this a case of the tail wagging the dog?

Maybe, but who’s to say that’s not a feasible approach. What if the sum of the parts leads me to the whole? How would I recognize the whole?

As an observability noob, many such questions continue to baffle me.

  • What should be the building blocks of my learning journey?
  • How should I design my learning pathway?
  • Why should I go down a certain path?

Last year, Charity Majors of honeycomb.io wrote a thought-provoking piece that debunked the myth about the three pillars of observability.

She spoke about instrumenting code and capturing details in a way that enable us to answer any question.

Here’s her tweet thread that would bewilder even an observability veteran, let alone a newbie.

My first stumbling block: Arbitrarily-wide structured data blobs

Do I really want to store data in three different ways? Probably not, but what exactly does this alternative she’s suggesting mean?

  • If monitoring helps us figure out only known knowns, how can it help us answer unknown or unanticipated questions?

  • If I am unaware of what I am unaware of, how can I even begin to know what and where do I need to debug?

  • With so many moving parts in a distributed system, how can I even create a mental model of the system?

Based on what I’ve been reading and watching so far, tracing is something I can kinda grasp. At least, through the lens of understanding the flow of events or the journey of a service request. Monitoring and logging seem inapplicable to me when I don’t yet know what I don’t know. Also, how do I account for the silos created by decoupled metrics and lost context?

If monitoring helps us figure out known knowns, how can it help us answer unknown or unanticipated questions? And what about process variations? Do we consider common and special causes?

If I am unaware of what I am unaware of, how can I even begin to know what information I’d need to debug failure or glitches. With so many moving parts in a distributed system, will log aggregation help me spot a needle in a needle stack? Scalyr seems to think so, and here are two interesting blogs that showcase their claims:

Choosing the right tool for an unknown job. Where does one begin?

With so much talk about observability, SRE, and chaos engineering, I’m beginning to ponder about the foundational elements of reliability and resiliency. While enthusiasm about the tools draws much attention, few are considering the skills, the game plan, and the worldview that lead to the choice of the right tool for the job.

  • What should be the fundamental building blocks of my learning curve?
  • How should I design my learning pathway?
  • Why should I even go down a certain path?

Seems like the primary need in research is to be comfortable with volatility, uncertainty, complexity, and ambiguity (VUCA). I haven’t formed an opinion about monitoring and logging, and I might not do so anytime soon.

To understand resiliency, nature seems like the best anchor for cues and inspiration to grok this labyrinth. I am designing and evaluating the efficacy of my learning journey by asking the what, why, and how of resilient, observable, and distributed systems in nature.

Related Posts

Articulate with ease (AWE)

Articulate with ease (AWE)

Cover image credit: Lubos Houska on Pixabay Since 2006, I paid less attention to the continuity of research on a topic because each ghostwriting assignment had a hard stop on time and content.

Read More
Jalo Monivilja: From reality to fiction

Jalo Monivilja: From reality to fiction

Cover image credit: Shot by the author when she was in Espoo, Finland.

Read More
The source of boundless joy

The source of boundless joy

Cover image credit: (Left) Shot by the author. (Right) Taken by a staff member of Mission Ridge Animal Hospital (Alberta).

Read More