Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend.
Recent articles
- LLM predictions for 2026, shared with Oxide and Friends - 8th January 2026
- Introducing gisthost.github.io - 1st January 2026
- 2025: The year in LLMs - 31st December 2025