Simon Willison’s Weblog

37 items tagged “aws”

2022

You should have lots of AWS accounts (via) Richard Crowley makes the case for maintaining multiple AWS accounts within a single company, because “AWS accounts are the most complete form of isolation on offer”. # 3rd October 2022, 6:36 pm

Figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain (via) This took me five hours and 77 issue comments to figure out, but I finally managed to serve an AWS Lambda function running Datasette on a custom subdomain with an HTTPS certificate. I was going to write this up as a TIL but I’m exhausted so I decided to share my private notes thread instead. # 3rd October 2022, 12:29 am

Deploying Python web apps as AWS Lambda functions. After literally years of failed half-hearted attempts, I finally managed to deploy an ASGI Python web application (Datasette) to an AWS Lambda function! Here are my extensive notes. # 19th September 2022, 4:05 am

Over the years, across multiple deployments, DynamoDB has learned that it’s not just the end state and the start state that matter; there could be times when the newly deployed software doesn’t work and needs a rollback. The rolled-back state might be different from the initial state of the software. The rollback procedure is often missed in testing and can lead to customer impact. DynamoDB runs a suite of upgrade and downgrade tests at a component level before every deployment. Then, the software is rolled back on purpose and tested by running functional tests. DynamoDB has found this process valuable for catching issues that otherwise would make it hard to rollback if needed.

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service # 5th September 2022, 6:49 pm

The Amazon Builders’ Library (via) “How Amazon builds and operates software”—an extraordinarily valuable collection of detailed articles about how AWS works and operates under the hood. # 5th September 2022, 5:50 pm

sqlite-comprehend: run AWS entity extraction against content in a SQLite database

I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.

[... 1146 words]

s3-ocr: Extract text from PDF files stored in an S3 bucket

I’ve released s3-ocr, a new tool that runs Amazon’s Textract OCR text extraction against PDF files in an S3 bucket, then writes the resulting text out to a SQLite database with full-text search configured so you can run searches against the extracted data.

[... 1493 words]

Abusing AWS Lambda to make an Aussie Search Engine (via) Ben Boyter built a search engine that only indexes .au Australian websites, with the novel approach of directly compiling the search index into 250 different ~40MB large lambda functions written in Go, then running searches across 12 million pages by farming them out to all of the lambdas and combining the results. His write-up includes all sorts of details about how he built this, including how he ran the indexer and how he solved the surprisingly hard problem of returning good-enough text snippets for the results. # 16th January 2022, 8:52 pm

2021

Weeknotes: git-history, created for a Git scraping workshop

My main project this week was a 90 minute workshop I delivered about Git scraping at Coda.Br 2021, a Brazilian data journalism conference, on Friday. This inspired the creation of a brand new tool, git-history, plus smaller improvements to a range of other projects.

[... 1239 words]

AWS IAM definitions in Datasette (via) As part of my ongoing quest to conquer IAM permissions, I built myself a Datasette instance that lets me run queries against all 10,441 permissions across 280 AWS services. It’s deployed by a build script running in GitHub Actions which downloads a 8.9MB JSON file from the Salesforce policy_sentry repository—policy_sentry itself creates that JSON file by running an HTML scraper against the official AWS documentation! # 6th November 2021, 3:47 am

aws-lambda-adapter. AWS Lambda added support for Docker containers last year, but with a very weird shape: you can run anything on Lambda that fits in a Docker container, but unlike Google Cloud Run your application doesn’t get to speak HTTP: it needs to run code that listens for proprietary AWS lambda events instead. The obvious way to fix this is to run some kind of custom proxy inside the container which turns AWS runtime events into HTTP calls to a regular web application. Serverlessish and re:Web are two open source projects that implemented this, and now AWS have their own implementation of that pattern, written in Rust. # 28th October 2021, 5:04 am

Behind the scenes, AWS Lambda (via) Bruno Schaatsbergen pulled together details about how AWS Lambda works under the hood from a detailed review of the AWS documentation, the Firecracker paper and various talks at AWS re:Invent. # 10th July 2021, 7:40 pm

This teaches us that—when it’s a big enough deal—Amazon will lie to us. And coming from the company that runs the production infrastructure for our companies, stores our data, and has been granted an outsized position of trust based upon having earned it over 15 years, this is a nightmare.

Corey Quinn # 31st March 2021, 4:47 pm

Weeknotes: SpatiaLite 5, Datasette on Azure, more CDC vaccination history

This week I got SpatiaLite 5 working in the Datasette Docker image, improved the CDC vaccination history git scraper, figured out Datasette on Azure and we closed on a new home!

[... 986 words]

2020

New for AWS Lambda – Container Image Support. “You can now package and deploy Lambda functions as container images of up to 10 GB in size”—can’t wait to try this out with Datasette. # 1st December 2020, 5:34 pm

AWS services explained in one line each (via) Impressive effort to summarize all 163(!) AWS services—this helped clarify a whole bunch that I haven’t figured yet. Only a few defeated the author, with a single question mark for the description. I enjoyed Amazon Braket: “Some quantum thing. It’s in preview so I have no idea what it is.” # 26th May 2020, 4:41 pm

Millions of tiny databases. Fascinating, detailed review of a paper that describes Amazon’s Physalia, a distributed configuration store designed to provide extremely high availability coordination for Elastic Block Store replication. My eyebrows raised at “Physalia is designed to offer consistency and high-availability, even under network partitions.” since that’s such a blatant violation of CAP theorem, but it later justifies it like so: “One desirable property therefore, is that in the event of a partition, a client’s Physalia database will be on the same side of the partition as the client. Clever placement of cells across nodes can maximise the chances of this.” # 5th March 2020, 4:37 am

2019

athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources. # 18th December 2019, 9:05 am

Serverless Microservice Patterns for AWS (via) A handy collection of 19 architectural patterns for AWS Lambda collected by Jeremy Daly. # 12th June 2019, 12:13 am

Using 6 Page and 2 Page Documents To Make Organizational Decisions (via) I’ve been thinking a lot recently about the challenges of efficiently getting to consensus within a larger organization spread across multiple locations and time zones. This model described by Ian Nowland based on his experience at AWS seems very promising. The goal is to achieve a decision or “disagree and commit” consensus using a max 6 page document and a one hour meeting. The first fifteen minutes of the meeting are dedicated to silently reading the document—if you’ve read it already you are given the option of arriving fifteen minutes late. # 11th April 2019, 3:46 am

Colm MacCárthaigh tells the inside story of how AWS responded to Heartbleed. The Heartbleed SSL vulnerability came out five years ago. In this Twitter thread Colm, who was Amazon’s principal engineer for Elastic Load Balancer at the time, describes how the AWS team responded to something that “was scarier than any bug I’d ever seen”. It’s a cracking story. # 7th April 2019, 8:32 pm

The Cloud and Open Source Powder Keg (via) Stephen O’Grady’s analysis of the Elastic v.s. AWS situation, where Elastic started mixing their open source and non-open source code together and Amazon responded by releasing their own forked “open distribution for Elasticsearch”. World War One analogies included! # 17th March 2019, 7:08 pm

2018

AWS Ground Station – Ingest and Process Data from Orbiting Satellites. OK this is cool. “Instead of building your own ground station or entering in to a long-term contract, you can make use of AWS Ground Station on an as-needed, pay-as-you-go basis. [...] You don’t need to build or maintain antennas, and can focus on your work or research. We’re starting out with a pair of ground stations today, and will have 12 in operation by mid-2019. Each ground station is associated with a particular AWS Region; the raw analog data from the satellite is processed by our modem digitizer into a data stream (in what is formally known as VITA 49 baseband or VITA 49 RF over IP data streams) and routed to an EC2 instance that is responsible for doing the signal processing to turn it into a byte stream.” # 28th November 2018, 1:04 am

The Free Stack—Running your application for free on AWS (via) Parikshit Agnihotry provides a useful rundown of quite how much you can get done using the first 12 month free tier of AWS API Gateway, Lambda, DynamoDB, S3 and CloudFront. # 25th July 2018, 6:33 pm

2017

Landsat on AWS (via) TIL Amazon make data from the Landsat 8 satellite available for free on S3 (though they are no doubt hoping you’ll pay for EC2 instances to process the data). “All new Landsat 8 scenes are made available each day, often within hours of production. The satellite images the entire Earth every 16 days at a roughly 30 meter resolution”. # 5th November 2017, 7:56 pm

2013

Using AWS, as my cloud, what is left for me to work on? Is it enough for me to just write the html+css code and programming language code (python)? Or do I stil have to work with mysql and backend stuff? I am pretty new at programming, so I hope it i...

Using a cloud server platform like Amazon EC2 unfortunately will not protect you from needing to understand basic server adminstration—it’s not that different from running your own physical server, except that if you screw up the configuration it’s much easier to throw everything away and start from scratch.

[... 134 words]

What kind of website can be run on AWS for 10, 100, 1 thousand, 10 thousand, 100 thousand, 1 million dollars per month?

“But is there a simple way to say that for 10$ per month you can run website on AWS, that has X unique users and Y data transfer...”

[... 166 words]

For a Django application, deployed on Heroku, what are my options for storing user-uploaded media files?

S3 is really a no-brainer for this, it’s extremely inexpensive, very easy to integrate with and unbelievably reliable. It’s so cheap that it will be practically free for testing purposes (expect to spend pennies a month on it).

[... 88 words]

2009

Evidence of OpenID at Amazon. It looks like Amazon are using OpenID for SSO between their different properties—I clicked a link to sign in to AWS and the URL had OpenID query string parameters. # 6th July 2009, 1:25 am

AWS Import/Export: Ship Us That Disk! Andrew Tanenbaum said “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway”, and now you can ship your storage device direct to Amazon and have them load the data in to an S3 bucket for you. # 21st May 2009, 11:22 am