Items tagged projects, datajournalism

Filters: projects × datajournalism × Sorted by date

12 results

AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

[... 6080 words]

9:04 pm / 17th April 2024 / datajournalism, journalism, projects, talks, ai, datasette, datasettecloud, generativeai, llms, llm, annotatedtalks, enrichments

Running OCR against PDFs and images directly in your browser

I attended the Story Discovery At Scale data journalism conference at Stanford this week. One of the perennial hot topics at any journalism conference concerns data extraction: how can we best get data out of PDFs and images?

[... 2263 words]

5:59 pm / 30th March 2024 / datajournalism, ocr, projects, tesseract, aiassistedprogramming

Weeknotes: Getting ready for NICAR

Next week is NICAR 2024 in Baltimore—the annual data journalism conference hosted by Investigative Reporters and Editors. I’m running a workshop on Datasette, and I plan to spend most of my time in the hallway track talking to people about Datasette, Datasette Cloud and how the Datasette ecosystem can best help support their work.

[... 1390 words]

4:21 am / 27th February 2024 / datajournalism, projects, datasette, weeknotes, datasettecloud, nicar

Measuring traffic during the Half Moon Bay Pumpkin Festival

This weekend was the 50th annual Half Moon Bay Pumpkin Festival.

[... 2693 words]

3:41 pm / 19th October 2022 / datajournalism, projects, sqlite, datasette, gitscraping, githistory, datasettelite, halfmoonbay

git-history: a tool for analyzing scraped data collected using Git and SQLite

I described Git scraping last year: a technique for writing scrapers where you periodically snapshot a source of data to a Git repository in order to record changes to that source over time.

[... 2002 words]

10:32 pm / 7th December 2021 / datajournalism, git, projects, scraping, sqlite, datasette, githistory

Weeknotes: sqlite-transform 1.1, Datasette 0.58.1, datasette-graphql 1.5

Work on Project Pelican inspires new features and improvements across a number of different projects.

[... 1419 words]

9:29 pm / 21st July 2021 / datajournalism, projects, sqlite, graphql, datasette, weeknotes

sba-loans-covid-19-datasette (via) The treasury department released a bunch of data on the Covid-19 SBA Paycheck Protection Program Loan recipients today—I’ve loaded the most interesting data (the $150,000+ loans) into a Datasette instance. # 7th July 2020, 2:42 am

Weeknotes: Covid-19, First Python Notebook, more Dogsheep, Tailscale

My covid-19.datasettes.com project publishes information on COVID-19 cases around the world. The project started out using data from Johns Hopkins CSSE, but last week the New York Times started publishing high quality USA county- and state-level daily numbers to their own repository. Here’s the change that added the NY Times data.

[... 993 words]

8:29 pm / 1st April 2020 / bradfitzpatrick, datajournalism, projects, python, teaching, datasette, dogsheep, weeknotes, tailscale, covid19, benwelsh

Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf

I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.

[... 834 words]

2:27 am / 4th March 2020 / csrf, datajournalism, ical, plugins, projects, search, security, datasette, asgi, weeknotes, datasettecloud

Tracking FARA by deploying a data API using GitHub Actions and Cloud Run

I’m using the combination of GitHub Actions and Google Cloud Run to retrieve data from the U.S. Department of Justice FARA website and deploy it as a queryable API using Datasette.

[... 1599 words]

7:51 am / 21st January 2020 / continuousdeployment, continuousintegration, datajournalism, github, projects, datasette, cloudrun, githubactions, gitscraping

Tracking PG&E outages by scraping to a git repo

PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.

[... 833 words]

11:32 pm / 10th October 2019 / datajournalism, projects, scraping, sqlite, datasette, gitscraping, digitalocean, sqliteutils

Generating a commit log for San Francisco’s official list of trees

San Francisco has a neat open data portal (as do an increasingly large number of cities these days). For a few years my favourite file on there has been Street Tree List, a list of all 190,000 trees in the city maintained by the Department of Public Works.

[... 1051 words]

2:49 pm / 13th March 2019 / csv, datajournalism, git, projects, sanfrancisco, gitscraping

Simon Willison’s Weblog