Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend.
Recent articles
- Teresa T is name of the whale in Pillar Point Harbor near Half Moon Bay - 8th September 2024
- Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes - 6th September 2024
- Building a tool showing how Gemini Pro can return bounding boxes for objects in images - 26th August 2024