Simon Willison’s Weblog

4 items tagged “datascience”

Things About Real-World Data Science Not Discussed In MOOCs and Thought Pieces (via) Really good article, pointing out that carefully optimizing machine learning models is only a small part of the day-to-day work of a data scientist: cleaning up data, building dashboards, shipping models to production, deciding on trade-offs between performance and production and considering the product design and ethical implementations of what you are doing make up a much larger portion of the job. # 11th December 2018, 8:51 pm

Serverless for data scientists (via) Slides and accompanying notes from a talk by Mike Lee Williams at PyBay, providing an overview of Zappa and diving a bit more deeply into pywren, which makes it trivial to parallelize a function across a set of AWS lambda instances (serverless Python map() execution essentially). I really like this format for sharing presentations—I used something similar for my own PyBay talk. # 25th August 2018, 11:01 pm

Computational and Inferential Thinking: The Foundations of Data Science. Free online textbook written for the UC Berkeley Foundations of Data Science class. The examples are all provided as Jupyter notebooks, using the mybinder web application to allow students to launch interactive notebooks for any of the examples without having to install any software on their own machines. # 25th August 2018, 10:13 pm

Beginner’s Guide to Jupyter Notebooks for Data Science (with Tips, Tricks!) (via) If you haven’t yet got on the Jupyter notebooks bandwagon this should help. It’s the single biggest productivity improvement I’ve made to my workflow in a very long time. # 24th May 2018, 1:58 pm