Simon Willison’s Weblog

Subscribe

Quotations tagged data

Filters: Type: quotation × data × Sorted by date


To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in. The process of training the neural network compiles the dataset into the binary — the final neural network. In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

Andrej Karpathy # 24th August 2022, 9:28 pm

With Flickr you can get out, via the API, every single piece of information you put into the system. [...] Asking people to accept anything else is sharecropping. It’s a bad deal. Flickr helped pioneer “Web 2.0″, and personal data ownership is a key piece of that vision. Just because the wider public hasn’t caught on yet to all the nuances around data access, data privacy, data ownership, and data fidelity, doesn’t mean you shouldn’t be embarrassed to be failing to deliver a quality product.

Kellan Elliott-McCrea # 18th May 2010, 6:21 pm