Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend.
Recent articles
- Highlights from the Claude 4 system prompt - 25th May 2025
- Live blog: Claude 4 launch at Code with Claude - 22nd May 2025
- I really don't like ChatGPT's new memory dossier - 21st May 2025