Simon Willison’s Weblog

My JSK Fellowship: Building an open source ecosystem of tools for data journalism

I started a new chapter of my career last week: I began a year long fellowship with the John S. Knight Journalism Fellowships program at Stanford.

I’m going to spend the year thinking about and working on tools for data journalism. More details below, but the short version is that I want to help make the kind of data reporting we’re seeing from well funded publications like the New York Times, the Washington Post and the LA Times more accessible to smaller publications that don’t have the budget for full-time software engineers.

I’ve worked with newspapers a few times in the past: I helped create what would later become Django at the Lawrence Journal-World fifteen years ago, and I spent two years working on data journalism projects at the Guardian in London before being sucked into the tech startup world. My Datasette project was inspired by the challenges I saw at the Guardian, and I’m hoping to evolve it (and its accompanying ecosystem) in as useful a way as possible.

This fellowship is a chance for me to get fully embedded back in that world. I could not be more excited about it!

I’m at the Online News Association conference in New Orleans this week: if you’d like to meet up for a chat please drop me a line on Twitter or via email (swillison is my Gmail).

Here’s the part of my fellowship application (written back in January) which describes what I’m hoping to do. The program is extremely flexible and there is plenty of opportunity for me to change my focus if something more useful emerges from my research, but this provides a good indication of where my current thinking lies.

What is your fellowship proposal?

Think of this as your title or headline for your proposal. (25 words or less)

How might we grow an open source ecosystem of tools to help data journalists collect, analyze and publish the data underlying their stories?

Now, tell us more about your proposal. Why is it important to the challenges facing journalism and journalists today? How might it create meaningful change or advance the work of journalists? (600 words or less)

Data journalism is a crucial discipline for discovering and explaining true stories about the modern world—but effective data-driven reporting still requires tools and skills that are still not widely available outside of large, well funded news organizations.

Making these techniques readily available to smaller, local publications can help them punch above their weight, producing more impactful stories that overcome the challenges posed by their constrained resources.

Tools that work for smaller publications can work for larger publications as well. Reducing the time and money needed to produce great data journalism raises all boats and enables journalists to re-invest their improved productivity in ever more ambitious reporting projects.

Academic journals are moving towards publishing both the code and data that underlies their papers, encouraging reproducibility and better sharing of the underlying techniques. I want to encourage the same culture for data journalism, in the hope that “showing your working” can help fight misinformation and improve reader’s trust in the stories that are derived from the data.

I would like to use a JSK fellowship to build an ecosystem of data journalism tools that make data-driven reporting as productive and reproducible as possible, while opening it up to a much wider group of journalists.

At the core of my proposal is my Datasette open source project. I’ve been running this as a side-project for a year with some success: newspapers that have used it include the Baltimore Sun, who used it for their public salary records project: By dedicating myself to the project full-time I anticipate being able to greatly accelerate the pace of development and my ability to spend time teaching news organizations how to take advantage of it.

More importantly, the JSK fellowship would give me high quality access to journalism students, professors and professionals. A large portion of my fellowship would be spent talking to a wide pool of potential users and learning exactly what people need from the project.

I do not intend to be the only developer behind Datasette: I plan to deliberately grow the pool of contributors, both to the Datasette core project but also in developing tools and plugins that enhance the project’s capabilities. The great thing about a plugin ecosystem is that it removes the need for a gatekeeper: anyone can build and release a plugin independent of Datasette core, which both lowers the barriers to entry and dramatically increases the rate at which new functionality becomes available to all Datasette users.

My goal for the fellowship is to encourage the growth of open source tools that can be used by data journalists to increase the impact of their work. My experience at the Guardian lead me to Datasette as a promising avenue for this, but in talking to practitioners and students I hope to find other opportunities for tools that can help. My experience as a startup founder, R&D software engineer and an open source contributor put me in an excellent position to help create these tools in partnership with the wider open source community.