Simon Willison's Weblog: observability

Did you know about Instruments?

2024-07-26T13:06:38+00:00

Thorsten Ball shows how the macOS Instruments app (installed as part of Xcode) can be used to run a CPU profiler against any application - not just code written in Swift/Objective C.

I tried this against a Python process running LLM executing a Llama 3.1 prompt with my new llm-gguf plugin and captured this:

Via lobste.rs

Tags: profiling, python, observability

All you need is Wide Events, not “Metrics, Logs and Traces”

2024-02-27T22:57:14+00:00

All you need is Wide Events, not “Metrics, Logs and Traces”

I’ve heard great things about Meta’s internal observability platform Scuba, here’s an explanation from ex-Meta engineer Ivan Burmistrov describing the value it provides and comparing it to the widely used OpenTelemetry stack.

Via Hacker News

Tags: facebook, observability

Roblox Return to Service 10/28-10/31 2021

2022-01-21T16:41:00+00:00

Roblox Return to Service 10/28-10/31 2021

A particularly good example of a public postmortem on an outage. Roblox was down for 72 hours last year, as a result of an extremely complex set of circumstances which took a lot of effort to uncover. It’s interesting to think through what kind of monitoring you would need to have in place to help identify the root cause of this kind of issue.

Via @benbjohnson

Tags: ops, observability, postmortem

Quoting Brendan Gregg

2021-06-08T19:33:16+00:00

When I was a performance consultant I'd show up to random companies who wanted me to fix their computer performance issues. If they trusted me with a login to their production servers, I could help them a lot quicker. To get that trust I knew which tools looked but didn't touch: Which were observability tools and which were experimental tools. "I'll start with observability tools only" is something I'd say at the start of every engagement.

— Brendan Gregg

Tags: observability, performance, brendan-gregg

Quoting Charity Majors

2020-07-19T16:05:08+00:00

Instead of seeing instrumentation as a last-ditch effort of strings and metrics, we must think about propagating the full context of a request and emitting it at regular pulses. No pull request should ever be accepted unless the engineer can answer the question, “How will I know if this breaks?”

— Charity Majors

Tags: observability, charity-majors

Logs vs. metrics: a false dichotomy

2019-08-03T16:46:55+00:00

Logs vs. metrics: a false dichotomy

Nick Stenning discusses the differences between logs and metrics: most notably that metrics can be derived from logs but logs cannot be reconstituted starting with time-series metrics.

Via Charity Majors

Tags: logging, observability

Targeted diagnostic logging in production

2019-07-24T05:44:39+00:00

Targeted diagnostic logging in production

Will Sargent defines diagnostic logging as “debug logging statements with an audience”, and proposes controlling this style if logging via a feature flat system to allow detailed logging to be turned on in production against a selected subset if users in order to help debug difficult problems. Lots of great background material in the topic of observability here too.

Via Charity Majors

Tags: logging, observability

Quoting Clint Sharp

2019-02-25T22:15:45+00:00

Metrics are lossily compressed logs. Traces are logs with parent child relationships between entries. The only reason we have three terms is because getting value from them has required different compromises to make them cost effective.

— Clint Sharp

Tags: observability, logs