Simon Willison’s Weblog

Subscribe

38 items tagged “jupyter”

2024

Interesting ideas in Observable Framework

Mike Bostock, Announcing: Observable Framework:

[... 2123 words]

Marimo (via) This is a really interesting new twist on Python notebooks.

The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.

There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors. # 12th January 2024, 9:17 pm

2023

Bottleneck T5 Text Autoencoder (via) Colab notebook by Linus Lee demonstrating his Contra Bottleneck T5 embedding model, which can take up to 512 tokens of text, convert that into a 1024 floating point number embedding vector... and then then reconstruct the original text (or a close imitation) from the embedding again.

This allows for some fascinating tricks, where you can do things like generate embeddings for two completely different sentences and then reconstruct a new sentence that combines the weights from both. # 10th October 2023, 2:12 am

codespaces-jupyter (via) This is really neat. Click “Use this template” -> “Open in a codespace” and you get a full in-browser VS Code interface where you can open existing notebook files (or create new ones) and start playing with them straight away. # 14th April 2023, 10:38 pm

2022

Grokking Stable Diffusion (via) Jonathan Whitaker built this interactive Jupyter notebook that walks through how to use Stable Diffusion from Python step-by-step, and then dives deep into helping understand the different components of the implementation, including how text is encoded, how the diffusion loop works and more. This is by far the most useful tool I’ve seen yet for understanding how this model actually works. You can run Jonathan’s notebook directly on Google Colab, with a GPU. # 4th September 2022, 6:50 pm

storysniffer (via) Ben Welsh built a small Python library that guesses if a URL points to an article on a news website, or if it’s more likely to be a category page or /about page or similar. I really like this as an example of what you can do with a tiny machine learning model: the model is bundled as a ~3MB pickle file as part of the package, and the repository includes the Jupyter notebook that was used to train it. # 1st August 2022, 11:40 pm

2021

Weeknotes: datasette-jupyterlite, s3-credentials and a Python packaging talk

My big project this week was s3-credentials, described yesterday—but I also put together a fun expermiental Datasette plugin bundling JupyterLite and wrote up my PyGotham talk on Python packaging.

[... 476 words]

2020

Proof of concept: sqlite_utils magic for Jupyter (via) Tony Hirst has been experimenting with building a Jupyter “magic” that adds special syntax for using sqlite-utils to insert data and run queries. Query results come back as a Pandas DataFrame, which Jupyter then displays as a table. # 21st October 2020, 5:26 pm

Estimating COVID-19’s Rt in Real-Time. I’m not qualified to comment on the mathematical approach, but this is a really nice example of a Jupyter Notebook explanatory essay by Kevin Systrom. # 20th April 2020, 3:06 pm

Weeknotes: Datasette 0.39 and many other projects

This week’s theme: Well, I’m not going anywhere. So a ton of progress to report on various projects.

[... 806 words]

2019

selenium-demoscraper (via) Really useful minimal example of a Binder project. Click the button to launch a Jupyter notebook in Binder that can take screenshots of URLs using Selenium-controlled headless Firefox. The binder/ folder uses an apt.txt file to install Firefox, requirements.txt to get some Python dependencies and a postBuild Python script to download the Gecko Selenium driver. # 4th November 2019, 3:05 pm

Los Angeles Weedmaps analysis (via) Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file. # 30th May 2019, 4:35 am

2018

Fast Autocomplete Search for Your Website (via) I wrote a tutorial for the 24 ways advent calendar on building fast autocomplete search for a website on top of Datasette and SQLite. I built the demo against 24 ways itself—I used wget to recursively fetch all 330 articles as HTML, then wrote code in a Jupyter notebook to extract the raw data from them (with BeautifulSoup) and load them into SQLite using my sqlite-utils Python library. I deployed the resulting database using Datasette, then wrote some vanilla JavaScript to implement autocomplete using fast SQL queries against the Datasette JSON API. # 19th December 2018, 12:26 am

The _repr_html_ method in Jupyter notebooks (via) Today I learned that if you add a _repr_html_ method returning a string of HTML to any Python class Jupyter notebooks will render that HTML inline to represent that object. # 12th December 2018, 6:09 pm

repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker https://github.com/norvig/pytudes” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm

Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well. # 19th November 2018, 6:25 pm

Tracking Jupyter: Newsletter, the Third... (via) Tony Hirst’s tracking Jupyter newsletter is fantastic. The Jupyter ecosystem is incredibly exciting and fast moving at the moment as more and more groups discover how productive it is, and Tony’s newsletter is a wealth of information on what’s going on out there. # 9th November 2018, 5:42 pm

Computational and Inferential Thinking: The Foundations of Data Science. Free online textbook written for the UC Berkeley Foundations of Data Science class. The examples are all provided as Jupyter notebooks, using the mybinder web application to allow students to launch interactive notebooks for any of the examples without having to install any software on their own machines. # 25th August 2018, 10:13 pm

The Future of Notebooks: Lessons from JupyterCon (via) It sounds like reactive notebooks (where cells keep track of their dependencies on other cells and re-evaluate when those update) were a hot topic at JupyterCon this year. # 25th August 2018, 9:55 pm

In case you missed it: @GoogleColab can open any @ProjectJupyter notebook directly from @github!

To run the notebook, just replace “github.com” with “colab.research.google.com/github/” in the notebook URL, and it will be loaded into Colab.

Jake VanderPlas # 25th August 2018, 3:16 am

I don’t like Jupyter Notebooks—a presentation by Joel Grus (via) Fascinating talk by Joel Grus at the Jupyter conference in New York. He highlights some of the drawbacks of he Jupyter way of working, including the huge confusion that can come from the ability to execute cells out of order (something Observable notebooks solve brilliantly using spreadsheet-style reactive cell associations). He also makes strong arguments that notebooks encourage a way of working that discourages people from producing stable, repeatable and well tested code. # 25th August 2018, 3:04 am

Beyond Interactive: Notebook Innovation at Netflix. Netflix have been investing heavily in their internal Jupyter notebooks infrastructure: it’s now the most popular tool for working with data at Netflix. They also use parameterized notebooks to make it easy to create templates for reusable operations, and scheduled notebooks for recurring tasks. “When a Spark or Presto job executes from the scheduler, the source code is injected into a newly-created notebook and executed. That notebook then becomes an immutable historical record, containing all related artifacts — including source code, parameters, runtime config, execution logs, error messages, and so on.” # 18th August 2018, 5:55 pm

Every day more than 1 trillion events are written into a streaming ingestion pipeline, which is processed and written to a 100PB cloud-native data warehouse. And every day, our users run more than 150,000 jobs against this data, spanning everything from reporting and analysis to machine learning and recommendation algorithms.

Netflix Technology Blog # 18th August 2018, 5:35 pm

At Harvard we’ve built out an infrastructure to allow us to deploy JupyterHub to courses with authentication managed by Canvas. It has allowed us to easily deploy complex set-ups to students so they can do really cool stuff without having to spend hours walking them through setup. Instructors are writing their lectures as IPython notebooks, and distributing them to students, who then work through them in their JupyterHub environment. Our most ambitious so far has been setting up each student in the course with a p2.xlarge machine with cuda and TensorFlow so they could do deep learning work for their final projects. We supported 15 courses last year, and got deployment time for an implementation down to only 2-3 hours.

Chris Rogers # 5th June 2018, 7:37 pm

Beginner’s Guide to Jupyter Notebooks for Data Science (with Tips, Tricks!) (via) If you haven’t yet got on the Jupyter notebooks bandwagon this should help. It’s the single biggest productivity improvement I’ve made to my workflow in a very long time. # 24th May 2018, 1:58 pm

mendoza-trees-workshop (via) Eventbrite Argentina has an academy program to train new Python/Django developers. I presented a workshop there this morning showing how Django and Jupyter can be used together to iterate on a project. Since the session was primarily about demonstrating Jupyter it was mostly live-coding, but the joy of Jupyter is that at the end of a workshop you can go back and add inline commentary to the notebooks that you used. In putting together the workshop I learned about the django_extensions “/manage.py shell_plus --notebook” command—it’s brilliant! It launches Jupyter in a way that lets you directly import your Django models without having to mess around with DJANGO_SETTINGS_MODULE. # 8th May 2018, 5:22 pm

Iodide Notebook: Project Examples (via) Iodide is a very promising looking open source JavaScript notebook project, and these examples do a great job of showing what it can do. It’s not as slick (yet) as Observable but it does run completely independently using just a browser. # 3rd May 2018, 6:42 pm

Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them. # 19th April 2018, 4:05 pm

Scientific results today are as often as not found with the help of computers. That’s because the ideas are complex, dynamic, hard to grab ahold of in your mind’s eye. And yet by far the most popular tool we have for communicating these results is the PDF—literally a simulation of a piece of paper. Maybe we can do better.

James Somers # 8th April 2018, 1:14 pm

Observable Beta (via) Observable just released their beta, and it’s quite something. It’s by Mike Bostock (d3), Jeremy Ashkenas (Backbone, CoffeeScript) and Tom MacWright (Mapbox Studio). The easiest way to describe it is Jupyter notebooks for JavaScript supporting reactive programming—so code is evaluated as you type and you can add interactive widgets (like sliders and canvas views) to construct explorable visualizations on the fly. # 31st January 2018, 4:46 pm