48 items tagged “data-journalism”
2020
Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf
I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.
[... 834 words]Tracking FARA by deploying a data API using GitHub Actions and Cloud Run
I’m using the combination of GitHub Actions and Google Cloud Run to retrieve data from the U.S. Department of Justice FARA website and deploy it as a queryable API using Datasette.
[... 1,599 words]2019
Building tools to bring data-driven reporting to more newsrooms. I wrote about my fellowship project so far and my goals for the next few months for the JSK Medium publication. My next priority: an invite-only hosted version for newsrooms so that figuring out how to install and manage the software isn’t the biggest barrier to entry.
Tracking PG&E outages by scraping to a git repo
PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.
[... 868 words]My JSK Fellowship: Building an open source ecosystem of tools for data journalism
I started a new chapter of my career last week: I began a year long fellowship with the John S. Knight Journalism Fellowships program at Stanford.
[... 876 words]Los Angeles Weedmaps analysis (via) Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file.
VisiData
(via)
Intriguing tool by Saul Pwanson: VisiData is a command-line "textpunk utility" for browsing and manipulating tabular data. pip3 install visidata
and then vd myfile.csv
(or .json
or .xls
or SQLite or others) and get an interactive terminal UI for quickly searching through the data, conducting frequency analysis of columns, manipulating it and much more besides. Two tips for if you start playing with it: hit gq
to exit, and hit Ctrl+H
to view the help screen.
Generating a commit log for San Francisco’s official list of trees
San Francisco has a neat open data portal (as do an increasingly large number of cities these days). For a few years my favourite file on there has been Street Tree List, a list of all 190,000 trees in the city maintained by the Department of Public Works.
[... 1,051 words]Publish the data behind your stories with SQLite and Datasette. I presented a workshop on Datasette at the IRE and NICAR CAR 2019 data journalism conference yesterday. Here’s the worksheet I prepared for the tutorial.
socrata2sql (via) Phenomenal new open source tool released by Andrew Chavez at the Dallas Morning News. Socrata is the open data portal software used by huge numbers of local governments worldwide. socrata2sql is a tool that interacts with the standard Socrata API and can use it to suck down a dataset and save it as a SQLite, PostgreSQL, MySQL or other SQLAlchemy-supported database. I just tried this and it took a single command to create a SQLite database of every police arrest in Dallas in the past five years.
2018
Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well.
How to Instantly Publish Data to the Internet with Datasette
I spoke about my Datasette project at PyBay in August and they’ve just posted the video of my talk.
[... 58 words]Notes from my appearance on the Changelog podcast
After I spoke at Zeit Day SF last weekend I sat down with Adam Stacoviak to record a 25 minute segment for episode 296 of the Changelog podcast, talking about Datasette. We covered a lot of ground!
[... 536 words]Baltimore Sun Public Salary Records (via) The Baltimore Sun have published an interactive search engine for public salaries of Maryland state employees, and it’s powered by Datasette! Since data journalism is one of my key use-cases for Datasette I’m incredibly excited to see this in the wild. They’ve also published the underlying source code (see the via link) which is a really nice example of how to use Datasette’s custom templates and canned query functionality.
2009
Data Is Journalism: MSNBC.com Acquires Everyblock. Congratulations Adrian, Wilson and the team! Brady Forrest reports the acquisition within the larger context of the rise of data-driven journalism.
Learning to Think Like A Programmer. Outstanding advice aimed mainly at journalists, but important to anyone who collects information for a living and might want it to be automatically processed at some point in the future.
Train Crash Leads LA Times to Create Django Database on Deadline. A story from last September. I didn’t know the LA Times used Django. UPDATE: Yes I did, I introduced their panel about it at DjangoCon. Sorry, mind like a sieve sometimes.
Washington Post Update. Peter Harkins summarises the large number of Django-powered database journalism projects released by the Post since September 2007.