29 items tagged “jupyter”
This week’s theme: Well, I’m not going anywhere. So a ton of progress to report on various projects.[... 806 words]
selenium-demoscraper (via) Really useful minimal example of a Binder project. Click the button to launch a Jupyter notebook in Binder that can take screenshots of URLs using Selenium-controlled headless Firefox. The binder/ folder uses an apt.txt file to install Firefox, requirements.txt to get some Python dependencies and a postBuild Python script to download the Gecko Selenium driver. # 4th November 2019, 3:05 pm
Los Angeles Weedmaps analysis (via) Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file. # 30th May 2019, 4:35 am
The _repr_html_ method in Jupyter notebooks (via) Today I learned that if you add a _repr_html_ method returning a string of HTML to any Python class Jupyter notebooks will render that HTML inline to represent that object. # 12th December 2018, 6:09 pm
repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker https://github.com/norvig/pytudes” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm
Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well. # 19th November 2018, 6:25 pm
Tracking Jupyter: Newsletter, the Third... (via) Tony Hirst’s tracking Jupyter newsletter is fantastic. The Jupyter ecosystem is incredibly exciting and fast moving at the moment as more and more groups discover how productive it is, and Tony’s newsletter is a wealth of information on what’s going on out there. # 9th November 2018, 5:42 pm
Computational and Inferential Thinking: The Foundations of Data Science. Free online textbook written for the UC Berkeley Foundations of Data Science class. The examples are all provided as Jupyter notebooks, using the mybinder web application to allow students to launch interactive notebooks for any of the examples without having to install any software on their own machines. # 25th August 2018, 10:13 pm
The Future of Notebooks: Lessons from JupyterCon (via) It sounds like reactive notebooks (where cells keep track of their dependencies on other cells and re-evaluate when those update) were a hot topic at JupyterCon this year. # 25th August 2018, 9:55 pm
In case you missed it: @GoogleColab can open any @ProjectJupyter notebook directly from @github! To run the notebook, just replace “github.com” with “colab.research.google.com/github/” in the notebook URL, and it will be loaded into Colab.
I don’t like Jupyter Notebooks—a presentation by Joel Grus (via) Fascinating talk by Joel Grus at the Jupyter conference in New York. He highlights some of the drawbacks of he Jupyter way of working, including the huge confusion that can come from the ability to execute cells out of order (something Observable notebooks solve brilliantly using spreadsheet-style reactive cell associations). He also makes strong arguments that notebooks encourage a way of working that discourages people from producing stable, repeatable and well tested code. # 25th August 2018, 3:04 am
Beyond Interactive: Notebook Innovation at Netflix. Netflix have been investing heavily in their internal Jupyter notebooks infrastructure: it’s now the most popular tool for working with data at Netflix. They also use parameterized notebooks to make it easy to create templates for reusable operations, and scheduled notebooks for recurring tasks. “When a Spark or Presto job executes from the scheduler, the source code is injected into a newly-created notebook and executed. That notebook then becomes an immutable historical record, containing all related artifacts — including source code, parameters, runtime config, execution logs, error messages, and so on.” # 18th August 2018, 5:55 pm
Every day more than 1 trillion events are written into a streaming ingestion pipeline, which is processed and written to a 100PB cloud-native data warehouse. And every day, our users run more than 150,000 jobs against this data, spanning everything from reporting and analysis to machine learning and recommendation algorithms.
At Harvard we’ve built out an infrastructure to allow us to deploy JupyterHub to courses with authentication managed by Canvas. It has allowed us to easily deploy complex set-ups to students so they can do really cool stuff without having to spend hours walking them through setup. Instructors are writing their lectures as IPython notebooks, and distributing them to students, who then work through them in their JupyterHub environment. Our most ambitious so far has been setting up each student in the course with a p2.xlarge machine with cuda and TensorFlow so they could do deep learning work for their final projects. We supported 15 courses last year, and got deployment time for an implementation down to only 2-3 hours.
Beginner’s Guide to Jupyter Notebooks for Data Science (with Tips, Tricks!) (via) If you haven’t yet got on the Jupyter notebooks bandwagon this should help. It’s the single biggest productivity improvement I’ve made to my workflow in a very long time. # 24th May 2018, 1:58 pm
mendoza-trees-workshop (via) Eventbrite Argentina has an academy program to train new Python/Django developers. I presented a workshop there this morning showing how Django and Jupyter can be used together to iterate on a project. Since the session was primarily about demonstrating Jupyter it was mostly live-coding, but the joy of Jupyter is that at the end of a workshop you can go back and add inline commentary to the notebooks that you used. In putting together the workshop I learned about the django_extensions “/manage.py shell_plus --notebook” command—it’s brilliant! It launches Jupyter in a way that lets you directly import your Django models without having to mess around with DJANGO_SETTINGS_MODULE. # 8th May 2018, 5:22 pm
Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them. # 19th April 2018, 4:05 pm
Scientific results today are as often as not found with the help of computers. That’s because the ideas are complex, dynamic, hard to grab ahold of in your mind’s eye. And yet by far the most popular tool we have for communicating these results is the PDF—literally a simulation of a piece of paper. Maybe we can do better.
Interactive Workflows for C++ with Jupyter. Whoa, this really works... not just an interactive C++ REPL in a Jupyter notebook, but inline graph plotting support and interactive widgets as well. Scroll to the bottom of the article for Binder links which let you fire up an interactive C++ REPL in your browser and start interacting with it instantly. # 29th November 2017, 9:51 pm
Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib. # 10th November 2017, 3:34 pm
I’ve been writing a few scripts to backfill my blog with content I originally posted elsewhere. So far I’ve imported answers I posted on Quora (background), answers I posted on Ask MetaFilter and content I recovered from the Internet Archive.[... 560 words]
A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more. # 2nd November 2017, 1:23 am
Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas. # 29th October 2017, 4:58 pm
Streaming Dataframes. This is some deep and brilliant magic: Matthew Rocklin’s Streamz Python library provides some elegant abstractions for consuming infinite streams of data and calculating cumulative averages and rolling reductions... and now he’s added an integration with jupyter that lets you embed bokeh graphs and pandas dataframe tables that continue to update in realtime as the stream continues! Check out the animated screenshots, this really is a phenomenal piece of work. # 19th October 2017, 2:25 pm
When I restored my blog last weekend I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.[... 636 words]
Facets. New open source visualization and data exploration tool from Google (“Disclaimer: This is not an official Google product”, whatever that means). It’s intended for visualizing machine learning datasets but it’s obviously useful outside of ML as well—any time you need to understand a large dataset this looks like it could be extremely useful. Ships with example jupyter notebooks and an easy mechanism for embedding the Facets interactive UI directly inside a notebook cell. # 8th October 2017, 12:21 am