6 items tagged “analytics”
Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask) (via) ClickHouse by Yandex is an open source column-oriented data warehouse, designed to run analytical queries against TBs of data. They’ve loaded the full GitHub Archive of events since 2011 into a public instance, which is a great way of both exploring GitHub activity and trying out ClickHouse. Here’s a query I just ran that shows number of watch events per year, for example: SELECT toYear(created_at) as yyyy, count() FROM github_events WHERE event_type = ’WatchEvent’ group by yyyy # 5th January 2021, 1:02 am
If you are pre-product market fit it’s probably too early to think about event based analytics. If you have a small number of users and are able to talk with all of them, you will get much more meaningful data getting to know them than if you were to set up product analytics. You probably don’t have enough users to get meaningful data from product analytics anyways.
Defining Data Intuition. Ryan T. Harter, Principal Data Scientist at Mozilla defines data intuition as “a resilience to misleading data and analyses”. He also introduces the term “data-stink” as a similar term to “code smell”, where your intuition should lead you to distrust analysis that exhibits certain characteristics without first digging in further. I strongly believe that data reports should include a link the raw methodology and numbers to ensure they can be more easily vetted—so that data-stink can be investigated with the least amount of resistance. # 29th October 2020, 3:14 pm
Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend. # 12th April 2018, 3:44 pm