Blogmarks
Filters: Sorted by date
Using 6 Page and 2 Page Documents To Make Organizational Decisions (via) I’ve been thinking a lot recently about the challenges of efficiently getting to consensus within a larger organization spread across multiple locations and time zones. This model described by Ian Nowland based on his experience at AWS seems very promising. The goal is to achieve a decision or “disagree and commit” consensus using a max 6 page document and a one hour meeting. The first fifteen minutes of the meeting are dedicated to silently reading the document—if you’ve read it already you are given the option of arriving fifteen minutes late.
Ministry of Silly Runtimes: Vintage Python on Cloud Run (via) Cloud Run is an exciting new hosting service from Google that lets you define a container using a Dockerfile and then run that container in a “scale to zero” environment, so you only pay for time spent serving traffic. It’s similar to the now-deprecated Zeit Now 1.0 which inspired me to create Datasette. Here Dustin Ingram demonstrates how powerful Docker can be as the underlying abstraction by deploying a web app using a 25 year old version of Python 1.x.
Generator Tricks for Systems Programmers (via) David Beazley’s definitive generators tutorial from 2008, updated for Python 3.7 in October 2018.
What is a Self-XSS scam? Facebook link to this page from a console.log message that they display the browser devtools console, specifically warning that “If someone told you to copy-paste something here to enable a Facebook feature or hack someone’s account, it is a scam and will give them access to your Facebook account.”
Colm MacCárthaigh tells the inside story of how AWS responded to Heartbleed. The Heartbleed SSL vulnerability came out five years ago. In this Twitter thread Colm, who was Amazon’s principal engineer for Elastic Load Balancer at the time, describes how the AWS team responded to something that “was scarier than any bug I’d ever seen”. It’s a cracking story.
tsv-utils (via) Powerful collection of CLI tools for processing TSV files, written in D for performance and released by eBay. Includes a csv2tsv conversion tool. You can download an archive of pre-built binaries for Linux and OS X from their releases page: worked fine on my Mac.
csv-diff 0.3.1 (via) I released a minor update to my csv-diff CLI tool today which does a better job of displaying a human-readable representation of rows that have been added or removed from a file—previously they were represented as an ugly JSON dump. My script monitoring changes to the official list of trees in San Francisco has been running for a month now and has captured 23 commits!
The problem with laziness: minimising performance issues caused by Django’s implicit database queries (via) The ability to accidentally execute further database queries by traversing objects from a Django template is a common source of unexpected performance regressions. django-zen-queries is a neat new library which provides a context manager for disabling database queries during a render (or elsewhere), forcing queries to be explicitly executed in view functions.
zson (via) “ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents [...] In some cases ZSON can save half of your disk space and give you about 10% more TPS.”
The Next CEO of Stack Overflow. “Including the Stack Exchange network of 174 sites, we have over 100 million monthly visitors. Every month, over 125,000 wonderful people write answers”—this fits the rule of thumb for user-generated content that only a tiny portion of your audience will actively create content: in this case it’s just 0.125% (one eighth of one percent). I’d love to know how many people are upvoting or performing other more lightweight interactions.
Programmer migration patterns. Avery Pennarun explores the history of modern programming languages and how developers have migrated from one to another over time. Lots of fun insights in this.
VisiData
(via)
Intriguing tool by Saul Pwanson: VisiData is a command-line "textpunk utility" for browsing and manipulating tabular data. pip3 install visidata and then vd myfile.csv (or .json or .xls or SQLite or others) and get an interactive terminal UI for quickly searching through the data, conducting frequency analysis of columns, manipulating it and much more besides. Two tips for if you start playing with it: hit gq to exit, and hit Ctrl+H to view the help screen.
The Cloud and Open Source Powder Keg (via) Stephen O’Grady’s analysis of the Elastic v.s. AWS situation, where Elastic started mixing their open source and non-open source code together and Amazon responded by releasing their own forked “open distribution for Elasticsearch”. World War One analogies included!
What the Hell is Going On? (via) David Perell discusses how the shift from information scarcity to information abundance is reshaping commerce, education, and politics. Long but worthwhile.
Client-side instrumentation for under $1 per month. No servers necessary. (via) Rolling your own analytics used to be too complex and expensive to be worth the effort. Thanks to cloud technologies like Cloudfront, Athena, S3 and Lambda you can now inexpensively implement client-side analytics (via requests to a tracking pixel) that stores detailed logs on S3, then use Amazon Athena to run queries against those logs ($5/TB scanned) to get detailed reporting. This post also introduced me to Snowplow, an open source JavaScript analytics script (released by a commercial analytics platform) which looks very neat—it’s based on piwik.js, the tracker from the open-source Piwik analytics tool.
D3 Projection Comparison (via) Fun Observable notebook that lets you compare any two out of D3’s 96 (!) geographical projections of the world.
datasette-jellyfish. I learned about a handy Python library called Jellyfish which implements approximate and phonetic matching of strings—soundex, metaphone, porter stemming, levenshtein distance and more. I’ve built a simple Datasette plugin which wraps the library and makes each of those algorithms available as a SQL function.
Publish the data behind your stories with SQLite and Datasette. I presented a workshop on Datasette at the IRE and NICAR CAR 2019 data journalism conference yesterday. Here’s the worksheet I prepared for the tutorial.
MySQL: How to get the top N rows for each group. MySQL doesn’t support the row_number() window function that’s available in PostgreSQL (and recent SQLite), which means it can’t easily answer questions like “for each of these authors, give me the most recent three blog entries they have written” in a single query. Only it turns out it can, if you abuse MySQL session variables in a devious way. This isn’t a new feature: MySQL has had this for over a decade, and in my rough testing it works quickly even on tables with millions of rows.
List of Physical Visualizations (via) “A chronological list of physical visualizations and related artifacts, maintained by Pierre Dragicevic and Yvonne Jansen”—327 and counting!
Experiments, growth engineering, and exposing company secrets through your API (via) This is fun: Jon Luca observes that many companies that run A/B tests have private JSON APIs that list all of their ongoing experiments, and uses them to explore tests from Lyft, Airbnb, Pinterest, Amazon and more. Facebook and Instagram use SSL Stapling which makes it harder to spy on their mobile app traffic.
huey. Charles Leifer’s “little task queue for Python”. Similar to Celery, but it’s designed to work with Redis, SQLite or in the parent process using background greenlets. Worth checking out for the really neat design. The project is new to me, but it’s been under active development since 2011 and has a very healthy looking rate of releases.
My Twitter thread collecting behind the scenes content about Spider-Man: Into the Spider-Verse. I absolutely loved Spider-Verse, and I’ve been delighted to discover that many of the artists who created the movie are active on Twitter and have been posting all kinds of fascinating material about their creative process. I’ve been collecting examples in this Twitter thread for a couple of months now. They definitely deserved that Oscar.
Seeking the Productive Life: Some Details of My Personal Infrastructure (via) Stephen Wolfram’s 15,000 word epic about his personal approach to productivity, developed over the past thirty years. This is a fascinating document—I found myself thinking “surely there can’t be more information than this” and then spotting that the scrollbar wasn’t even a third done yet. Very hard to summarize: it turns out if you’re the work-from-home CEO of your own privately held 800 person company you can construct some very opinionated habits.
String length—Rosetta Code
(via)
Calculating the length of a string is surprisingly difficult once Unicode is involved. Here's a fascinating illustration of how that problem can be attached dozens of different programming languages. From that page: the string "J̲o̲s̲é̲" ("J\x{332}o\x{332}s\x{332}e\x{301}\x{332}") has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8.
Lessons from 6 software rewrite stories (via) Herb Caudill takes on the classic idea that rewriting from scratch is “the single worst strategic mistake that any software company can make” and investigates it through the lens of six well-chosen examples: Netscape 6, Basecamp Classic/2/3, Visual Studio/VS Code, Gmail/Inbox, FogBugz/Wasabi/Trello, and finally FreshBooks/BillSpring. Each story has details I had never heard before, and the lessons and conclusions are deeply insightful.
parameterized. I love the @parametrize decorator in pytest, which lets you run the same test multiple times against multiple parameters. The only catch is that the decorator in pytest doesn’t work for old-style unittest TestCase tests, which means you can’t easily add it to test suites that were built using the older model. I just found out about parameterized which works with unittest tests whether or not you are running them using the pytest test runner.
The Eleven Laws of Showrunning (via) Fascinating essay on how to run a modern TV show by Javier Grillo-Marxuach. Being a showrunner basically involves running a 100+ person startup with a 7 digit budget, almost immovable deadlines, high maintenance activist investors and you're still expected to write some of the scripts!
So many useful lessons here about management, creativity and delegation: almost everything in here is relevant to product management, startup founding and engineering management as well.
Are you strong and secure enough in your talent and accomplishment to accept the possibility that other people - properly empowered by you - can actually enhance your genius... or will you cling to the idea that only you can be the source of that genius?
How you answer that question determines the leader you will be.
This one is the "nice" version - a not so nice version is available as well.
Discussion about Altavista on Hacker News. Fascinating thread on Hacker News where Bryant Durrell, a former Director from Altavista provides some insider thoughts on how they lost against Google.
Data science is different now (via) Detailed examination of the current state of the job market for data science. Boot camps and university courses have produced a growing volume of junior data scientists seeking work, but the job market is much more competitive than many expected—especially for those without prior experience. Meanwhile the job itself is much more about data cleanup and software engineering skills: machine learning models and applied statistics end up being a small portion of the actual work.