<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: politics</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/politics.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-12-09T23:52:05+00:00</updated><author><name>Simon Willison</name></author><entry><title>Under the hood of Canada Spends with Brendan Samek</title><link href="https://simonwillison.net/2025/Dec/9/canada-spends/#atom-tag" rel="alternate"/><published>2025-12-09T23:52:05+00:00</published><updated>2025-12-09T23:52:05+00:00</updated><id>https://simonwillison.net/2025/Dec/9/canada-spends/#atom-tag</id><summary type="html">
    &lt;p&gt;I talked to Brendan Samek about &lt;a href="https://canadaspends.com/"&gt;Canada Spends&lt;/a&gt;, a project from &lt;a href="https://www.buildcanada.com/"&gt;Build Canada&lt;/a&gt; that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt; and pieces of LLM-powered PDF extraction.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po"&gt;the video on YouTube&lt;/a&gt;.&lt;/p&gt;
&lt;iframe style="margin-bottom: 1.5em;" width="560" height="315" src="https://www.youtube-nocookie.com/embed/T8xiMgmb8po" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="allowfullscreen"&gt; &lt;/iframe&gt;

&lt;p&gt;Sections within that video:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=177s"&gt;02:57&lt;/a&gt; Data sources and the PDF problem&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=351s"&gt;05:51&lt;/a&gt; Crowdsourcing financial data across Canada&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=447s"&gt;07:27&lt;/a&gt; Datasette demo: Search and facets&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=753s"&gt;12:33&lt;/a&gt; Behind the scenes: Ingestion code&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=1044s"&gt;17:24&lt;/a&gt; Data quality horror stories&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=1246s"&gt;20:46&lt;/a&gt; Using Gemini to extract PDF data&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=T8xiMgmb8po&amp;amp;t=1524s"&gt;25:24&lt;/a&gt; Why SQLite is perfect for data distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="build-canada-and-canada-spends"&gt;Build Canada and Canada Spends&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.buildcanada.com/"&gt;Build Canada&lt;/a&gt; is a volunteer-driven non-profit that launched in February 2025 - here's &lt;a href="https://www.canadianaffairs.news/2025/09/26/builders-at-the-gate-inside-the-civic-movement-to-jolt-canada-out-of-stagnation/"&gt;some background information&lt;/a&gt; on the organization, which has a strong pro-entrepreneurship and pro-technology angle.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://canadaspends.com/"&gt;Canada Spends&lt;/a&gt; is their project to make Canadian government financial data more accessible and explorable. It includes a tax sources and sinks visualizer and a searchable database of government contracts, plus a collection of tools covering financial data from different levels of government.&lt;/p&gt;
&lt;h4 id="datasette-for-data-exploration"&gt;Datasette for data exploration&lt;/h4&gt;
&lt;p&gt;The project maintains a Datasette instance at &lt;a href="https://api.canadasbuilding.com/"&gt;api.canadasbilding.com&lt;/a&gt; containing the data they have gathered and processed from multiple data sources - currently more than 2 million rows plus a combined search index across a denormalized copy of that data.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/api-canadasbuilding-com-canada-spends.jpg" alt="  Datasette UI for a canada-spends database.  aggregated-contracts-under-10k:  year, contract_goods_number_of, contracts_goods_original_value, contracts_goods_amendment_value, contract_service_number_of, contracts_service_original_value, contracts_service_amendment_value, contract_construction_number_of, contracts_construction_original_value, contracts_construction_amendment_value, acquisition_card_transactions_number_of, acquisition_card_transactions_total_value, owner_org, owner_org_title  487 rows cihr_grants  external_id, title, project_lead_name, co_researchers, institution, province, country, competition_year, award_amount, program, program_type, theme, research_subject, keywords, abstract, duration, source_url  53,420 rows contracts-over-10k:   reference_number, procurement_id, vendor_name, vendor_postal_code, buyer_name, contract_date, economic_object_code, description_en, description_fr, contract_period_start, delivery_date, contract_value, original_value, amendment_value, comments_en, comments_fr, additional_comments_en, additional_comments_fr, agreement_type_code, trade_agreement, land_claims, commodity_type, commodity_code, country_of_vendor, solicitation_procedure, limited_tendering_reason, trade_agreement_exceptions, indigenous_business, indigenous_business_excluding_psib, intellectual_property, potential_commercial_exploitation, former_public_servant, contracting_entity, standing_offer_number, instrument_type, ministers_office, number_of_bids, article_6_exceptions, award_criteria, socioeconomic_indicator, reporting_period, owner_org, owner_org_title  1,172,575 rows global_affairs_grants:   id, projectNumber, dateModified, title, description, status, start, end, countries, executingAgencyPartner, DACSectors, maximumContribution, ContributingOrganization, expectedResults, resultsAchieved, aidType, collaborationType, financeType, flowType, reportingOrganisation, programName, selectionMechanism, policyMarkers, regions, alternameImPositions, budgets, Locations, otherIdentifiers, participatingOrgs, programDataStructure, relatedActivities, transactions  2,378 rows nserc_grants:   title, award_summary, application_id, competition_year, fiscal_year, project_lead_name, institution, department, province, award_amount, installment, program, selection_committee, research_subject, area_of_application, co-researchers, partners, external_id, source_url  701,310 rows sshrc_grants:   id, title, program, fiscal_year, competition_year, applicant, organization, amount, discipline, area_of_research, co_applicant, keywords, source_url  213,085 rows transfers:   FSCL_YR, MINC, MINE, MINF, DepartmentNumber-Numéro-de-Ministère, DEPT_EN_DESC, DEPT_FR_DESC, RCPNT_CLS_EN_DESC, RCPNT_CLS_FR_DESC, RCPNT_NML_EN_DESC, RCPNT_NML_FR_DESC, CTY_EN_NM, CTY_FR_NM, PROVTER_EN, PROVTER_FR, CNTRY_EN_NM, CNTRY_FR_NM, TOT_CY_XPND_AMT, AGRG_PYMT_AMT  357,797 rows  Download SQLite DB: canada-spends.db 2.4 GB Powered by Datasette · Queries took 24.733ms " style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="processing-pdfs"&gt;Processing PDFs&lt;/h4&gt;
&lt;p&gt;The highest quality government financial data comes from the audited financial statements that every Canadian government department is required to publish. As is so often the case with government data, these are usually published as PDFs.&lt;/p&gt;
&lt;p&gt;Brendan has been using Gemini to help extract data from those PDFs. Since this is accounting data the numbers can be summed and cross-checked to help validate the LLM didn't make any obvious mistakes.&lt;/p&gt;
&lt;h4 id="further-reading"&gt;Further reading&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://datasette.io/"&gt;datasette.io&lt;/a&gt;, the official website for Datasette&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils.datasette.io&lt;/a&gt; for more on &lt;code&gt;sqlite-utils&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://canadaspends.com/"&gt;Canada Spends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/BuildCanada/CanadaSpends"&gt;BuildCanada/CanadaSpends&lt;/a&gt; on GitHub&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/youtube"&gt;youtube&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="data-journalism"/><category term="politics"/><category term="sqlite"/><category term="youtube"/><category term="datasette"/><category term="sqlite-utils"/></entry><entry><title>Quoting The Bluesky Team</title><link href="https://simonwillison.net/2025/Aug/22/mississippi/#atom-tag" rel="alternate"/><published>2025-08-22T21:36:24+00:00</published><updated>2025-08-22T21:36:24+00:00</updated><id>https://simonwillison.net/2025/Aug/22/mississippi/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://bsky.social/about/blog/08-22-2025-mississippi-hb1126"&gt;&lt;p&gt;Mississippi's approach would fundamentally change how users access Bluesky. The Supreme Court’s recent &lt;a href="https://www.supremecourt.gov/opinions/24pdf/25a97_5h25.pdf"&gt;decision&lt;/a&gt; leaves us facing a hard reality: comply with Mississippi’s age assurance &lt;a href="https://legiscan.com/MS/text/HB1126/id/2988284"&gt;law&lt;/a&gt;—and make &lt;em&gt;every&lt;/em&gt; Mississippi Bluesky user hand over sensitive personal information and undergo age checks to access the site—or risk massive fines. The law would also require us to identify and track which users are children, unlike our approach in other regions. [...]&lt;/p&gt;
&lt;p&gt;We believe effective child safety policies should be carefully tailored to address real harms, without creating huge obstacles for smaller providers and resulting in negative consequences for free expression. That’s why until legal challenges to this law are resolved, we’ve made the difficult decision to block access from Mississippi IP addresses.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://bsky.social/about/blog/08-22-2025-mississippi-hb1126"&gt;The Bluesky Team&lt;/a&gt;, on why they have blocked access from Mississippi&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/privacy"&gt;privacy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bluesky"&gt;bluesky&lt;/a&gt;&lt;/p&gt;



</summary><category term="politics"/><category term="privacy"/><category term="bluesky"/></entry><entry><title>Quoting Cory Doctorow</title><link href="https://simonwillison.net/2025/Aug/14/cory-doctorow/#atom-tag" rel="alternate"/><published>2025-08-14T20:39:28+00:00</published><updated>2025-08-14T20:39:28+00:00</updated><id>https://simonwillison.net/2025/Aug/14/cory-doctorow/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://pluralistic.net/2025/08/14/bellovin/#wont-someone-think-of-the-cryptographers"&gt;&lt;p&gt;&lt;em&gt;NERD HARDER!&lt;/em&gt; is the answer every time a politician gets a technological idée-fixe about how to solve a social problem by creating a technology that can't exist. It's the answer that EU politicians who backed the catastrophic proposal to require copyright filters for all user-generated content came up with, when faced with objections that these filters would block billions of legitimate acts of speech [...]&lt;/p&gt;
&lt;p&gt;When politicians seize on a technological impossibility as a technological necessity, they flail about and desperately latch onto scholarly work that they can brandish as evidence that their idea &lt;em&gt;could&lt;/em&gt; be accomplished. [...]&lt;/p&gt;
&lt;p&gt;That's just happened, and in relation to one of the scariest, most destructive &lt;em&gt;NERD HARDER!&lt;/em&gt; tech policies ever to be assayed (a stiff competition). I'm talking about the UK Online Safety Act, which imposes a duty on websites to verify the age of people they communicate with before serving them anything that could be construed as child-inappropriate (a category that includes, e.g., much of Wikipedia)&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://pluralistic.net/2025/08/14/bellovin/#wont-someone-think-of-the-cryptographers"&gt;Cory Doctorow&lt;/a&gt;, "Privacy preserving age verification" is bullshit&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cory-doctorow"&gt;cory-doctorow&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/law"&gt;law&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/privacy"&gt;privacy&lt;/a&gt;&lt;/p&gt;



</summary><category term="cory-doctorow"/><category term="law"/><category term="politics"/><category term="privacy"/></entry><entry><title>Tips on prompting ChatGPT for UK technology secretary Peter Kyle</title><link href="https://simonwillison.net/2025/Jun/3/tips-for-peter-kyle/#atom-tag" rel="alternate"/><published>2025-06-03T19:08:57+00:00</published><updated>2025-06-03T19:08:57+00:00</updated><id>https://simonwillison.net/2025/Jun/3/tips-for-peter-kyle/#atom-tag</id><summary type="html">
    &lt;p&gt;Back in March &lt;a href="https://www.newscientist.com/article/2472068-revealed-how-the-uk-tech-secretary-uses-chatgpt-for-policy-advice/"&gt;New Scientist reported on&lt;/a&gt; a successful Freedom of Information request they had filed requesting UK Secretary of State for Science, Innovation and Technology &lt;a href="https://en.wikipedia.org/wiki/Peter_Kyle"&gt;Peter Kyle's&lt;/a&gt; ChatGPT logs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;New Scientist has obtained records of Kyle’s ChatGPT use under the Freedom of Information (FOI) Act, in what is believed to be a world-first test of whether chatbot interactions are subject to such laws.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What a fascinating precedent this could set!&lt;/p&gt;
&lt;p&gt;They picked out some highlights they thought were particularly newsworthy. Personally I'd have loved to see that raw data to accompany the story.&lt;/p&gt;
&lt;h4 id="a-good-example-of-a-poorly-considered-prompt"&gt;A good example of a poorly considered prompt&lt;/h4&gt;
&lt;p&gt;Among the questions Kyle asked of ChatGPT was this one:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Why is AI adoption so slow in the UK small and medium business community?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(I pinged the New Scientist reporter, Chris Stokel-Walker, to confirm the exact wording here.)&lt;/p&gt;
&lt;p&gt;This provides an irresistible example of the "jagged frontier" of LLMs in action. LLMs are great at some things, terrible at others and the difference between the two is often not obvious at all.&lt;/p&gt;
&lt;p&gt;Experienced prompters will no doubt have the same reaction I did: that's not going to give an accurate response! It's worth digging into why those of us with a firmly developed sense of intuition around LLMs would jump straight to that conclusion.&lt;/p&gt;
&lt;p&gt;The problem with this question is that it assumes a level of omniscience that even the very best LLMs do not possess.&lt;/p&gt;
&lt;p&gt;At the very best, I would expect this prompt to spit out the approximate average of what had been published on that subject in time to be hoovered up by the training data for the GPT-4o training cutoff &lt;a href="https://platform.openai.com/docs/models/gpt-4o"&gt;of September 2023&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;(Here's &lt;a href="https://chatgpt.com/share/683f3f94-d51c-8006-aea9-7567d08e2f68"&gt;what I got just now&lt;/a&gt; running it against GPT-4o.)&lt;/p&gt;
&lt;p&gt;This illustrates the first lesson of effective LLM usage: &lt;strong&gt;know your training cutoff dates&lt;/strong&gt;. For many queries these are an essential factor in whether or not the LLM is likely to provide you with a useful answer.&lt;/p&gt;
&lt;p&gt;Given the pace of change in the AI landscape, an answer based on September 2023 training data is unlikely to offer useful insights into the state of things in 2025.&lt;/p&gt;
&lt;p&gt;It's worth noting that there &lt;em&gt;are&lt;/em&gt; tools that might do better at this. OpenAI's Deep Research tool for example can run a barrage of searches against the web for recent information, then spend multiple minutes digesting those results, running follow-up searches and crunching that together into an impressive looking report.&lt;/p&gt;
&lt;p&gt;(I still wouldn't trust it for a question this broad though: the report format looks more credible than it is, and can suffer from &lt;a href="https://simonwillison.net/2025/Feb/25/deep-research-system-card/"&gt;misinformation by omission&lt;/a&gt; which is very difficult to spot.)&lt;/p&gt;
&lt;p&gt;Deep Research only rolled out in February this year, so it is unlikely to be the tool Peter Kyle was using given likely delays in receiving the requested FOIA data.&lt;/p&gt;
&lt;h4 id="what-i-would-do-instead"&gt;What I would do instead&lt;/h4&gt;
&lt;p&gt;Off the top of my head, here are examples of prompts I would use if I wanted to get ChatGPT's help digging into this particular question:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brainstorm potential reasons that UK SMBs might be slow to embrace recent advances in AI&lt;/strong&gt;. This would give me a starting point for my own thoughts about the subject, and may highlight some things I hadn't considered that I should look into further.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identify key stakeholders in the UK SMB community who might have insights on this issue&lt;/strong&gt;. I wouldn't expect anything comprehensive here, but it might turn up some initial names I could reach out to for interviews or further research.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I work in UK Government: which departments should I contact that might have relevant information on this topic&lt;/strong&gt;? Given the size and complexity of the UK government even cabinet ministers could be excused from knowing every department.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suggest other approaches I could take to research this issue&lt;/strong&gt;. Another brainstorming prompt. I like prompts like this where "right or wrong" doesn't particularly matter. LLMs are electric bicycles for the mind.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use your search tool: find recent credible studies on the subject and identify their authors&lt;/strong&gt;. I've been getting some good results from telling LLMs with good search tools - &lt;a href="https://simonwillison.net/2025/Apr/21/ai-assisted-search/#o3-and-o4-mini-are-really-good-at-search"&gt;like o3 and o4-mini&lt;/a&gt; - to evaluate the "credibility" of sources they find. It's a dumb prompting hack but it appears to work quite well - you can watch their reasoning traces and see how they place more faith in papers from well known publications, or newspapers with strong reputations for fact checking.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="prompts-that-do-make-sense"&gt;Prompts that do make sense&lt;/h4&gt;
&lt;p&gt;From the New Scientist article:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As well as seeking this advice, Kyle asked ChatGPT to define various terms relevant to his department: antimatter, quantum and digital inclusion. Two experts &lt;em&gt;New Scientist&lt;/em&gt; spoke to said they were surprised by the quality of the responses when it came to ChatGPT's definitions of quantum. "This is surprisingly good, in my opinion," says &lt;a href="https://profiles.imperial.ac.uk/p.knight"&gt;Peter Knight&lt;/a&gt; at Imperial College London. "I think it's not bad at all," says &lt;a href="https://researchportal.hw.ac.uk/en/persons/cristian-bonato"&gt;Cristian Bonato&lt;/a&gt; at Heriot-Watt University in Edinburgh, UK.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This doesn't surprise me at all. If you ask a good LLM for definitions of terms with strong, well established meanings you're going to get great results almost every time.&lt;/p&gt;
&lt;p&gt;My rule of thumb used to be that if a friend who had just read the Wikipedia page on a subject could answer my question then an LLM will be able to answer it too.&lt;/p&gt;
&lt;p&gt;As the frontier models have grown stronger I've upgraded that rule of thumb. I now expect a good result for any mainstream-enough topic for which there was widespread consensus prior to that all-important training cutoff date.&lt;/p&gt;
&lt;p&gt;Once again, it all comes down to intuition. The only way to get really strong intuition as to what will work with LLMs is to spend a huge amount of time using them, and paying a skeptical eye to everything that they produce.&lt;/p&gt;
&lt;p&gt;Treating ChatGPT as an all knowing Oracle for anything outside of a two year stale Wikipedia version of the world's knowledge is almost always a mistake.&lt;/p&gt;
&lt;p&gt;Treating it as a brainstorming companion and electric bicycle for the mind is, I think, a much better strategy.&lt;/p&gt;
&lt;h4 id="should-the-uk-technology-secretary-be-using-chatgpt-"&gt;Should the UK technology secretary be using ChatGPT?&lt;/h4&gt;
&lt;p&gt;Some of the reporting I've seen around this story has seemed to suggest that Peter Kyle's use of ChatGPT is embarrassing.&lt;/p&gt;
&lt;p&gt;Personally, I think that if the UK's Secretary of State for Science, Innovation and Technology was &lt;em&gt;not&lt;/em&gt; exploring this family of technologies it would be a dereliction of duty!&lt;/p&gt;
&lt;p&gt;The thing we can't tell from these ChatGPT logs is how dependent he was on these results.&lt;/p&gt;
&lt;p&gt;Did he idly throw some questions at ChatGPT out of curiosity to see what came back, then ignore that entirely, engage with his policy team and talk to experts in the field to get a detailed understanding of the issues at hand?&lt;/p&gt;
&lt;p&gt;Or did he prompt ChatGPT, take the results as gospel and make policy decisions based on that sloppy interpretation of a two-year stale guess at the state of the world?&lt;/p&gt;
&lt;p&gt;Those are the questions I'd like to see answered.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/digital-literacy"&gt;digital-literacy&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="politics"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llms"/><category term="ai-ethics"/><category term="digital-literacy"/></entry><entry><title>Quoting Peter Bhat Harkins</title><link href="https://simonwillison.net/2025/Mar/20/peter-bhat-harkins/#atom-tag" rel="alternate"/><published>2025-03-20T16:26:35+00:00</published><updated>2025-03-20T16:26:35+00:00</updated><id>https://simonwillison.net/2025/Mar/20/peter-bhat-harkins/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://lobste.rs/s/ukosa1/uk_users_lobsters_needs_your_help_with#c_xevn8a"&gt;&lt;p&gt;I’ve disabled the pending geoblock of the UK because I now think the risks of the Online Safety Act to this site are low enough to change strategies to only geoblock if directly threatened by the regulator. [...]&lt;/p&gt;
&lt;p&gt;It is not possible for a hobby site to comply with the Online Safety Act. The OSA is written to censor huge commercial sites with professional legal teams, and even understanding one's obligations under the regulations is an enormous project requiring expensive legal advice.&lt;/p&gt;
&lt;p&gt;The law is &lt;a href="https://www.legislation.gov.uk/ukpga/2023/50/data.pdf"&gt;250 pages&lt;/a&gt; and the mandatory "guidance" from Ofcom is more than 3,000 pages of dense, cross-referenced UK-flavoured legalese. To find all the guidance you'll have to &lt;a href="https://www.ofcom.org.uk/information-for-industry/?SelectedTopic=67866"&gt;start here&lt;/a&gt;, click through to each of the 36 pages listed, and expand each page's collapsible sections that might have links to other pages and documents. (Though I can't be sure that leads to all their guidance, and note you'll have to check back regularly &lt;a href="https://www.ofcom.org.uk/siteassets/resources/documents/online-safety/information-for-industry/illegal-harms/overview.pdf?v=387529#page=4"&gt;for planned updates&lt;/a&gt;.)&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://lobste.rs/s/ukosa1/uk_users_lobsters_needs_your_help_with#c_xevn8a"&gt;Peter Bhat Harkins&lt;/a&gt;, site administrator, lobste.rs&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/law"&gt;law&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moderation"&gt;moderation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uk"&gt;uk&lt;/a&gt;&lt;/p&gt;



</summary><category term="law"/><category term="moderation"/><category term="politics"/><category term="uk"/></entry><entry><title>18f.org</title><link href="https://simonwillison.net/2025/Mar/2/18forg/#atom-tag" rel="alternate"/><published>2025-03-02T09:24:37+00:00</published><updated>2025-03-02T09:24:37+00:00</updated><id>https://simonwillison.net/2025/Mar/2/18forg/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://18f.org/"&gt;18f.org&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New site by members of 18F, the team within the US government that were doing some of the most effective work at improving government efficiency.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For over 11 years, 18F has been proudly serving you to make government technology work better. We are non-partisan civil servants. 18F has worked on hundreds of projects, all designed to make government technology not just efficient but effective, and to save money for American taxpayers.&lt;/p&gt;
&lt;p&gt;However, all employees at 18F – a group that the Trump Administration GSA Technology Transformation Services Director called "the gold standard" of civic tech – were terminated today at midnight ET.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;18F was doing exactly the type of work that DOGE claims to want – yet we were eliminated.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The entire team is now on "administrative leave" and locked out of their computers.&lt;/p&gt;
&lt;p&gt;But these are not the kind of civil servants to abandon their mission without a fight:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;We’re not done yet.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We’re still absorbing what has happened. We’re wrestling with what it will mean for ourselves and our families, as well as the impact on our partners and the American people.&lt;/p&gt;
&lt;p&gt;But we came to the government to fix things. And we’re not done with this work yet.&lt;/p&gt;
&lt;p&gt;More to come.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can &lt;a href="https://bsky.app/profile/team18f.bsky.social"&gt;follow @team18f.bsky.social&lt;/a&gt; on Bluesky.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/government"&gt;government&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/political-hacking"&gt;political-hacking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bluesky"&gt;bluesky&lt;/a&gt;&lt;/p&gt;



</summary><category term="government"/><category term="political-hacking"/><category term="politics"/><category term="bluesky"/></entry><entry><title>I Went To SQL Injection Court</title><link href="https://simonwillison.net/2025/Feb/25/i-went-to-sql-injection-court/#atom-tag" rel="alternate"/><published>2025-02-25T22:45:57+00:00</published><updated>2025-02-25T22:45:57+00:00</updated><id>https://simonwillison.net/2025/Feb/25/i-went-to-sql-injection-court/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://sockpuppet.org/blog/2025/02/09/fixing-illinois-foia/"&gt;I Went To SQL Injection Court&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thomas Ptacek talks about his ongoing involvement as an expert witness in an Illinois legal battle lead by Matt Chapman over whether a SQL schema (e.g. for the CANVAS parking ticket database) should be accessible to Freedom of Information (FOIA) requests against the Illinois state government.&lt;/p&gt;
&lt;p&gt;They eventually lost in the Illinois Supreme Court, but there's still hope in the shape of &lt;a href="https://legiscan.com/IL/bill/SB0226/2025"&gt;IL SB0226&lt;/a&gt;, a proposed bill that would amend the FOIA act to ensure "that the public body shall provide a sufficient description of the structures of all databases under the control of the public body to allow a requester to request the public body to perform specific database queries".&lt;/p&gt;
&lt;p&gt;Thomas &lt;a href="https://news.ycombinator.com/item?id=43175628#43175758"&gt;posted this comment&lt;/a&gt; on Hacker News:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Permit me a PSA about local politics: engaging in national politics is bleak and dispiriting, like being a gnat bouncing off the glass plate window of a skyscraper. Local politics is, by contrast, extremely responsive. I've gotten things done --- including a law passed --- in my spare time and at practically no expense (&lt;em&gt;drastically&lt;/em&gt; unlike national politics).&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=43175628"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/government"&gt;government&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/law"&gt;law&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql-injection"&gt;sql-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/thomas-ptacek"&gt;thomas-ptacek&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="databases"/><category term="government"/><category term="law"/><category term="politics"/><category term="sql"/><category term="sql-injection"/><category term="thomas-ptacek"/></entry><entry><title>Baroness Kidron's speech regarding UK AI legislation</title><link href="https://simonwillison.net/2025/Jan/29/baroness-kidron-speech/#atom-tag" rel="alternate"/><published>2025-01-29T17:25:36+00:00</published><updated>2025-01-29T17:25:36+00:00</updated><id>https://simonwillison.net/2025/Jan/29/baroness-kidron-speech/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://hansard.parliament.uk/Lords%E2%80%8F/2025-01-28/debates/9BEB4E59-CAB1-4AD3-BF66-FE32173F971D/Data(UseAndAccess)Bill(HL)#contribution-9A4614F3-3860-4E8E-BA1E-53E932589CBF"&gt;Baroness Kidron&amp;#x27;s speech regarding UK AI legislation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Barnstormer of a speech by UK film director and member of the House of Lords &lt;a href="https://en.wikipedia.org/wiki/Beeban_Kidron"&gt;Baroness Kidron&lt;/a&gt;. This is the Hansard transcript but you can also &lt;a href="https://parliamentlive.tv/event/index/d7da6908-8663-4412-8840-e6de3e180636?in=16:47:17"&gt;watch the video on parliamentlive.tv&lt;/a&gt;. She presents a strong argument against the UK's proposed copyright and AI reform legislation, which &lt;a href="https://www.theguardian.com/technology/2024/dec/17/uk-proposes-letting-tech-firms-use-copyrighted-work-to-train-ai"&gt;would provide a copyright exemption for AI training&lt;/a&gt; with a weak-toothed opt-out mechanism.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Government are doing this not because the current law does not protect intellectual property rights, nor because they do not understand the devastation it will cause, but because they are hooked on the delusion that the UK's best interests and economic future align with those of Silicon Valley.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;She throws in some cleverly selected numbers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Prime Minister cited an IMF report that claimed that, if fully realised, the gains from AI could be worth up to an average of £47 billion to the UK each year over a decade. He did not say that the very same report suggested that unemployment would increase by 5.5% over the same period. This is a big number—a lot of jobs and a very significant cost to the taxpayer. Nor does that £47 billion account for the transfer of funds from one sector to another. The creative industries contribute £126 billion per year to the economy. I do not understand the excitement about £47 billion when you are giving up £126 billion.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Mentions DeepSeek:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Before I sit down, I will quickly mention DeepSeek, a Chinese bot that is perhaps as good as any from the US—we will see—but which will certainly be a potential beneficiary of the proposed AI scraping exemption. Who cares that it does not recognise Taiwan or know what happened in Tiananmen Square? It was built for $5 million and wiped $1 trillion off the value of the US AI sector. The uncertainty that the Government claim is not an uncertainty about how copyright works; it is uncertainty about who will be the winners and losers in the race for AI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And finishes with this superb closing line:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The spectre of AI does nothing for growth if it gives away what we own so that we can rent from it what it makes.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;According &lt;a href="https://x.com/ednewtonrex/status/1884404480328061231"&gt;to Ed Newton-Rex&lt;/a&gt; the speech was effective:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;She managed to get the House of Lords to approve her amendments to the Data (Use and Access) Bill, which among other things requires overseas gen AI companies to respect UK copyright law if they sell their products in the UK. (As a reminder, it is illegal to train commercial gen AI models on ©️ work without a licence in the UK.)&lt;/p&gt;
&lt;p&gt;What's astonishing is that her amendments passed despite @UKLabour reportedly being whipped to vote against them, and the Conservatives largely abstaining. Essentially, Labour voted against the amendments, and &lt;em&gt;everyone else who voted&lt;/em&gt; voted to protect copyright holders.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(Is it true that in the UK it's currently "illegal to train commercial gen AI models on ©️ work"? From points 44, 45 and 46 of &lt;a href="https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence/copyright-and-artificial-intelligence"&gt;this Copyright and AI: Consultation document&lt;/a&gt; it seems to me that the official answer is "it's complicated".)&lt;/p&gt;
&lt;p&gt;I'm trying to understand if this amendment could make existing products such as ChatGPT, Claude and Gemini illegal to sell in the UK. How about usage of open weight models?

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/danielpemberton/status/1884553564636303610"&gt;@danielpemberton&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/copyright"&gt;copyright&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/economics"&gt;economics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/law"&gt;law&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deepseek"&gt;deepseek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="copyright"/><category term="economics"/><category term="ethics"/><category term="law"/><category term="politics"/><category term="ai"/><category term="generative-ai"/><category term="training-data"/><category term="deepseek"/><category term="ai-ethics"/><category term="ai-in-china"/></entry><entry><title>Private School Labeler on Bluesky</title><link href="https://simonwillison.net/2024/Nov/22/private-school-labeler-on-bluesky/#atom-tag" rel="alternate"/><published>2024-11-22T17:44:34+00:00</published><updated>2024-11-22T17:44:34+00:00</updated><id>https://simonwillison.net/2024/Nov/22/private-school-labeler-on-bluesky/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://bsky.app/profile/daddys.cash"&gt;Private School Labeler on Bluesky&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I am utterly delighted by this subversive use of Bluesky's &lt;a href="https://docs.bsky.app/docs/advanced-guides/moderation"&gt;labels feature&lt;/a&gt;, which allows you to subscribe to a custom application that then adds visible labels to profiles.&lt;/p&gt;
&lt;p&gt;The feature was designed for moderation, but this labeler subverts it by displaying labels on accounts belonging to British public figures showing which expensive private school they went to and what the current fees are for that school.&lt;/p&gt;
&lt;p&gt;Here's what it looks like on an account - tapping the label brings up the information about the fees:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Screenshot of a social media profile and post. Profile shows &amp;quot;James O'Brien @mrjamesob.bsky.social&amp;quot; with 166.7K followers, 531 following, 183 posts. Bio reads &amp;quot;Broadcaster &amp;amp; author.&amp;quot; Shows education at Ampleforth School and Private School. Contains a repost from Julia Hines about Rabbi Jeffrey, followed by a label showing &amp;quot;Ampleforth School £46,740/year (2024/2025). This label was applied by Private School Labeller" src="https://static.simonwillison.net/static/2024/bluesky-label.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;These labels are only visible to users who have deliberately subscribed to the labeler. Unsurprisingly, some of those labeled aren't too happy about it!&lt;/p&gt;
&lt;p&gt;In response to a comment about attending on a scholarship, the label creator &lt;a href="https://bsky.app/profile/daddys.cash/post/3lbl43ifho22n"&gt;said&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I'm explicit with the labeller that scholarship pupils, grant pupils, etc, are still included - because it's the later effects that are useful context - students from these schools get a leg up and a degree of privilege, which contributes eg to the overrepresentation in British media/politics&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;On the one hand, there are clearly opportunities for abuse here. But given the opt-in nature of the labelers, this doesn't feel hugely different to someone creating a separate webpage full of information about Bluesky profiles.&lt;/p&gt;
&lt;p&gt;I'm intrigued by the possibilities of labelers. There's a list of others on &lt;a href="https://www.bluesky-labelers.io/"&gt;bluesky-labelers.io&lt;/a&gt;, including another brilliant hack: &lt;a href="https://bsky.app/profile/did:plc:w6yx4bltuzdmiolooi4kd6zt"&gt;Bookmarks&lt;/a&gt;, which lets you "report" a post to the labeler and then displays those reported posts in a custom feed - providing a private bookmarks feature that Bluesky itself currently lacks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; &lt;a href="https://bsky.app/profile/us-gov-funding.bsky.social"&gt;@us-gov-funding.bsky.social&lt;/a&gt; is the inevitable labeler for US politicians showing which companies and industries are their top donors, built &lt;a href="https://bsky.app/profile/hipstersmoothie.com/post/3lbl2lgnq7c2f"&gt;by Andrew Lisowski&lt;/a&gt; (&lt;a href="https://github.com/hipstersmoothie/us-gov-contributions-labeler"&gt;source code here&lt;/a&gt;) using data sourced from &lt;a href="https://www.opensecrets.org/"&gt;OpenScrets&lt;/a&gt;. Here's what it looks like on &lt;a href="https://bsky.app/profile/senatorschumer.bsky.social/post/3lbkvtdc5ik2z"&gt;this post&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Post by Chuck Schumer. Labels show affiliated organizations: Citigroup Inc, Goldman Sachs, Lawyers/Law Firms, Paul, Weiss et al, Real Estate, Securities &amp;amp; Investment. Post text reads &amp;quot;Democracy is in serious trouble, but it's not dead. We all have power, and we can use it together to defend our freedoms.&amp;quot;" src="https://static.simonwillison.net/static/2024/chuck-label.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/moderation"&gt;moderation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/political-hacking"&gt;political-hacking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bluesky"&gt;bluesky&lt;/a&gt;&lt;/p&gt;



</summary><category term="apis"/><category term="moderation"/><category term="political-hacking"/><category term="politics"/><category term="bluesky"/></entry><entry><title>Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities</title><link href="https://simonwillison.net/2024/Nov/16/civic-band/#atom-tag" rel="alternate"/><published>2024-11-16T22:14:01+00:00</published><updated>2024-11-16T22:14:01+00:00</updated><id>https://simonwillison.net/2024/Nov/16/civic-band/#atom-tag</id><summary type="html">
    &lt;p&gt;I interviewed &lt;a href="https://phildini.dev/"&gt;Philip James&lt;/a&gt; about &lt;a href="https://civic.band/"&gt;Civic Band&lt;/a&gt;, his "slowly growing collection of databases of the minutes from civic governments". Philip demonstrated the site and talked through his pipeline for scraping and indexing meeting minutes from many different local government authorities around the USA.&lt;/p&gt;

&lt;iframe style="margin-top: 1.5em; margin-bottom: 1.5em;" width="560" height="315" src="https://www.youtube-nocookie.com/embed/OziYd7xcGzc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="allowfullscreen"&gt; &lt;/iframe&gt;

&lt;p&gt;We recorded this conversation as part of yesterday's Datasette Public Office Hours session.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/16/civic-band/#civic-band"&gt;Civic Band&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/16/civic-band/#the-technical-stack"&gt;The technical stack&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/16/civic-band/#scale-and-storage"&gt;Scale and storage&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/16/civic-band/#future-plans"&gt;Future plans&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id="civic-band"&gt;Civic Band&lt;/h4&gt;
&lt;p&gt;Philip was inspired to start thinking more about local government after the 2016 US election. He realised that there was a huge amount of information about decisions made by local authorities tucked away in their meeting minutes,but that information was hidden away in thousands of PDF files across many different websites.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There was this massive backlog of basically every decision that had ever been made by one of these bodies. But it was almost impossible to discover because it lives in these systems where the method of exchange is a PDF.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Philip lives in Alameda, which makes its minutes available &lt;a href="https://alameda.legistar.com/Calendar.aspx"&gt;via this portal&lt;/a&gt; powered by &lt;a href="https://granicus.com/product/legistar-agenda-management/"&gt;Legistar&lt;/a&gt;. It turns out there are a small number of vendors that provide this kind of software tool, so once you've written a scraper for one it's likely to work for many others as well.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://alameda.ca.civic.band/"&gt;the Civic Band portal for Alameda&lt;/a&gt;, powered by &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/civic-band-1.jpg" alt="Datasette instance titled Alameda Civic Data, has search box, a note that says  A fully-searchable database of Alameda, CA civic meeting minutes. Last updated: 2024-11-15T20:27:36. See the full list at Civic Band and a meetings database with tables minutes and agendas." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;It's running the &lt;a href="https://github.com/simonw/datasette-search-all"&gt;datasette-search-all&lt;/a&gt; plugin and has both tables configured for full-text search. Here's a &lt;a href="https://alameda.ca.civic.band/-/search?q=housing"&gt;search for housing&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/civic-band-2.jpg" alt="Search all tables - for housing. 43 results in meetings: agendas. Each result shows a meeting, date, page, text and a rendered page image" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="the-technical-stack"&gt;The technical stack&lt;/h4&gt;
&lt;p&gt;The public Civic Band sites all run using Datasette in Docker Containers - one container per municipality. They're hosted on a single &lt;a href="https://www.hetzner.com/"&gt;Hetzner&lt;/a&gt; machine.&lt;/p&gt;
&lt;p&gt;The ingestion pipeline runs separately from the main hosting environment, using a Mac Mini on Philp's desk at home.&lt;/p&gt;
&lt;p&gt;OCR works by breaking each PDF up into images and then running &lt;a href="https://github.com/tesseract-ocr/tesseract"&gt;Tesseract OCR&lt;/a&gt; against them directly on the Mac Mini. This processes in the order of 10,000 or less new pages of documents a day.&lt;/p&gt;
&lt;p&gt;Philip treats PDF as a normalization target, because the pipeline is designed around documents with pages of text. In the rare event that a municipality publishes documents in another format such as &lt;code&gt;.docx&lt;/code&gt; he converts them to PDF before processing.&lt;/p&gt;
&lt;p&gt;PNG images of the PDF pages are served via a CDN, and the OCRd text is written to SQLite database files - one per municipality. &lt;a href="https://sqlite.org/fts5.html"&gt;SQLite FTS&lt;/a&gt; provides full-text search.&lt;/p&gt;
&lt;h4 id="scale-and-storage"&gt;Scale and storage&lt;/h4&gt;
&lt;p&gt;The entire project currently comes to about 265GB on disk.  The PNGs of the pages use about 350GB of CDN storage.&lt;/p&gt;
&lt;p&gt;Most of the individual SQLite databases are very small. The largest is for &lt;a href="https://maui-county.hi.civic.band/"&gt;Maui County&lt;/a&gt; which is around 535MB because that county has professional stenographers taking detailed notes for every one of their meetings.&lt;/p&gt;
&lt;p&gt;Each city adds only a few documents a week so growth is manageable even as the number of cities grows.&lt;/p&gt;
&lt;h4 id="future-plans"&gt;Future plans&lt;/h4&gt;
&lt;p&gt;We talked quite a bit about a goal to allow users to subscribe to updates that match specific search terms.&lt;/p&gt;
&lt;p&gt;Philip has been building out a separate site called Civic Observer to address this need, which will store searches and then execute the periodically using the Datasette JSON API, with a Django app to record state to avoid sending the same alert more than once.&lt;/p&gt;

&lt;p&gt;I've had a long term ambition to build some kind of saved search alerts plugin for Datasette generally, to allow users to subscribe to new results for arbitrary SQL queries. My &lt;a href="https://github.com/simonw/sqlite-chronicle"&gt;sqlite-chronicle&lt;/a&gt; library is part or that effort - it uses SQLite triggers to maintain version numbers for individual rows in a table, allowing you to query just the rows that have been inserted or modified since the version number last time you ran the query.&lt;/p&gt;

&lt;p&gt;Philip is keen to talk to anyone who is interested in using Civic Band or helping expand it to even more cities. You can find him on the &lt;a href="https://datasette.io/discord"&gt;Datasette Discord&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/political-hacking"&gt;political-hacking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-public-office-hours"&gt;datasette-public-office-hours&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="data-journalism"/><category term="political-hacking"/><category term="politics"/><category term="sqlite"/><category term="datasette"/><category term="datasette-public-office-hours"/></entry><entry><title>Visualizing local election results with Datasette, Observable and MapLibre GL</title><link href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#atom-tag" rel="alternate"/><published>2024-11-09T23:32:06+00:00</published><updated>2024-11-09T23:32:06+00:00</updated><id>https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#atom-tag</id><summary type="html">
    &lt;p&gt;Alex Garcia and myself hosted the first &lt;a href="https://simonwillison.net/2024/Nov/7/datasette-public-office-hours/"&gt;Datasette Open Office Hours&lt;/a&gt; on Friday - a live-streamed video session where we hacked on a project together and took questions and tips from community members on Discord.&lt;/p&gt;
&lt;p&gt;We didn't record this one (surprisingly not a feature that Discord offers) but we hope to do more of these and record them in the future.&lt;/p&gt;
&lt;p&gt;This post is a detailed write-up of what we built during the session.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#san-mateo-county-election-results"&gt;San Mateo County election results&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#importing-csv-data-into-datasette"&gt;Importing CSV data into Datasette&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#modifying-the-schema"&gt;Modifying the schema&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#faceting-and-filtering-the-table"&gt;Faceting and filtering the table&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#importing-geospatial-precinct-shapes"&gt;Importing geospatial precinct shapes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#enriching-that-data-to-extract-the-precinct-ids"&gt;Enriching that data to extract the precinct IDs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#running-a-join"&gt;Running a join&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#creating-an-api-token-to-access-the-data"&gt;Creating an API token to access the data&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#getting-cors-working"&gt;Getting CORS working&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#working-with-datasette-in-observable"&gt;Working with Datasette in Observable&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#visualizing-those-with-maplibre-gl"&gt;Visualizing those with MapLibre GL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#observable-plot"&gt;Observable Plot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#bringing-it-all-together"&gt;Bringing it all together&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2024/Nov/9/visualizing-local-election-results/#we-ll-be-doing-this-again"&gt;We'll be doing this again&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h4 id="san-mateo-county-election-results"&gt;San Mateo County election results&lt;/h4&gt;
&lt;p&gt;I live in El Granada, a tiny town just north of Half Moon Bay in San Mateo County, California.&lt;/p&gt;
&lt;p&gt;Every county appears to handle counting and publishing election results differently. For San Mateo County the results are published &lt;a href="https://smcacre.gov/elections/november-5-2024-election-results"&gt;on this page&lt;/a&gt;, and detailed per-precinct and per-candidate breakdowns are made available as a CSV file.&lt;/p&gt;
&lt;p&gt;(I optimistically set up a &lt;a href="https://simonwillison.net/2020/Oct/9/git-scraping/"&gt;Git scraper&lt;/a&gt; for these results in &lt;a href="https://github.com/simonw/scrape-san-mateo-county-election-results-2024"&gt;simonw/scrape-san-mateo-county-election-results-2024&lt;/a&gt; only to learn that the CSV is updated just once a day, not continually as the ballots are counted.)&lt;/p&gt;
&lt;p&gt;I'm particularly invested in the results of the &lt;a href="http://granada.ca.gov/"&gt;Granada Community Services District&lt;/a&gt; board member elections. Our little town of El Granada is in "unincorporated San Mateo County" which means we don't have a mayor or any local officials, so the closest we get to hyper-local government is the officials that run our local sewage and parks organization! My partner Natalie ran &lt;a href="https://til.simonwillison.net/youtube/livestreaming"&gt;the candidate forum event&lt;/a&gt; (effectively the debate) featuring three of the four candidates running for the two open places on the board.&lt;/p&gt;
&lt;p&gt;Let's explore the data for that race using Datasette.&lt;/p&gt;
&lt;h4 id="importing-csv-data-into-datasette"&gt;Importing CSV data into Datasette&lt;/h4&gt;
&lt;p&gt;I ran my part of the demo using &lt;a href="https://www.datasette.cloud/"&gt;Datasette Cloud&lt;/a&gt;, the beta of my new hosted Datasette service.&lt;/p&gt;
&lt;p&gt;I started by using the pre-configured &lt;a href="https://github.com/datasette/datasette-import"&gt;datasette-import&lt;/a&gt; plugin to import the data from the CSV file into a fresh table:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-import-loop.gif" alt="Paste data to create a table - I drag and drop on a CSV file, which produces a preview of the first 100 of 15,589 rows. I click to Upload and a progress bar runs before redirecting me to the resulting table." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="modifying-the-schema"&gt;Modifying the schema&lt;/h4&gt;
&lt;p&gt;The table imported cleanly, but all of the columns from the CSV were still being treated as text. I used the &lt;a href=""&gt;datasette-edit-schema&lt;/a&gt; plugin to switch the relevant columns to integers so that we could run sums and sorts against them.&lt;/p&gt;
&lt;p&gt;(I also noted that I really should add a "detect column types" feature to that plugin!)&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/datasette-edit-schema.jpg" alt="Edit table data/san_mateo_election_results - an option to rename table and then one to change existing columns, where each column is listed in turn and some have their type select box set to integer instead of the default of text" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The resulting 15,589 rows represent counts from individual precincts around the county for each of the races and measures on the ballot, with a row per precinct per candidate/choice per race.&lt;/p&gt;
&lt;h4 id="faceting-and-filtering-the-table"&gt;Faceting and filtering the table&lt;/h4&gt;
&lt;p&gt;Since I'm interested in the Granada Community Services District election, I applied a facet on "Contest_title" and then used that to select that specific race.&lt;/p&gt;
&lt;p&gt;I applied additional facets on "candidate_name" and "Precinct name".&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/facet-candidates.jpg" alt="28 rows where Contest_title = Granada Community Services District Members, Board of Directors. Facets are precinct name (7 choices), candidate name (IRIS GRANT, JANET BRAYER, NANCY MARSH, WANDA BOWLES) and Contest_title" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This looks right to me: we have 7 precincts and 4 candidates for 28 rows in total.&lt;/p&gt;
&lt;h4 id="importing-geospatial-precinct-shapes"&gt;Importing geospatial precinct shapes&lt;/h4&gt;
&lt;p&gt;Those precinct names are pretty non-descriptive! What does 33001 mean?&lt;/p&gt;
&lt;p&gt;To answer that question, I added a new table.&lt;/p&gt;
&lt;p&gt;San Mateo County offers &lt;a href="https://smcacre.gov/elections/precinct-maps-pdf"&gt;precinct maps&lt;/a&gt; in the form of 23 PDF files. Our precincts are in the "Unincorporated Coastside" file:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/precinct-map-from-pdf.jpg" alt="Screenshot from a PDF - label is Unincorporated Coastside, it shows the area north of Half Moon Bay with a bunch of polygons with numeric identifiers." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Thankfully the county &lt;em&gt;also&lt;/em&gt; makes that data available as &lt;a href="https://data.smcgov.org/Government/Election-Precincts/g5sj-6zp8/about_data"&gt;geospatial data&lt;/a&gt;, hosted using Socrata with an option to export as GeoJSON.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/precincts-socrata.jpg" alt="Socrata interface, Election Precincts updated March 7 2022 - 533 views, 72 downloads, and export dataset modal shows a GeoJSON option to export 783 rows." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;datasette-import&lt;/code&gt; plugin can handle JSON files... and if a JSON file contains a top-level object with a key that is an array of objects, it will import those objects as a table.&lt;/p&gt;
&lt;p&gt;Dragging that file into Datasette is enough to import it as a table with a &lt;code&gt;properties&lt;/code&gt; JSON column containing properties and a &lt;code&gt;geometry&lt;/code&gt; JSON columnn with the GeoJSON geometry.&lt;/p&gt;
&lt;p&gt;Here's where another plugin kicks in: &lt;a href="https://datasette.io/plugins/datasette-leaflet-geojson"&gt;datasette-leaflet-geojson&lt;/a&gt; looks for columns that contain valid GeoJSON geometries and... draws them on a map!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/precincts-in-datasette.jpg" alt="Datasette precincts table with 783 rows. The properties column contains JSON keys lastupdate, creationda, prencitid, notes and active - the geometry column renders maps with polygons showing the shape of the precinct." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;So now we can see the shape of the individual geometries.&lt;/p&gt;
&lt;h4 id="enriching-that-data-to-extract-the-precinct-ids"&gt;Enriching that data to extract the precinct IDs&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;precinctid&lt;/code&gt; is present in the data, but it's tucked away in a JSON object in that &lt;code&gt;properties&lt;/code&gt; JSON blob. It would be more convenient if it was a top-level column.&lt;/p&gt;
&lt;p&gt;Datasette's &lt;a href="https://simonwillison.net/2023/Dec/1/datasette-enrichments/"&gt;enrichments feature&lt;/a&gt; provides tools for running operations against every row in a table and adding new columns based on the results.&lt;/p&gt;
&lt;p&gt;My Datasette Cloud instance was missing the &lt;a href="https://github.com/datasette/datasette-enrichments-quickjs"&gt;datasette-enrichments-quickjs plugin&lt;/a&gt; that would let me run JavaScript code against the data. I used my privileged access on Datasette Cloud to add that plugin to my requirements and restarted the instance to install it.&lt;/p&gt;
&lt;p&gt;I used that to run this JavaScript code against every row in the table and saved the output in a new &lt;code&gt;precinct_id&lt;/code&gt; column:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-en"&gt;enrich&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;row&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;parse&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;row&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;properties&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;precinctid&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/enrich-precincts.jpg" alt="Enrich data in precincts. 783 rows selected. JavaScript. Enrich data with a custom JavaScript function. JavaScript function: function enrich(row) { return JSON.stringify(row) + &amp;quot; enriched&amp;quot;; } - Define an enrich(row) JavaScript function taking an object and returning a value. Row keys: properties, geometry. Output mode: store the function result in a single column. Output clumn name: precinct_id. The column to store the output in - will be created if it does not exist. Output column type: text." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This took less than a second to run, adding and populating a new &lt;code&gt;precinct_id&lt;/code&gt; column for the table.&lt;/p&gt;
&lt;h4 id="running-a-join"&gt;Running a join&lt;/h4&gt;
&lt;p&gt;I demonstrated how to run a join between the election results and the precincts table using the Datasette SQL query editor.&lt;/p&gt;
&lt;p&gt;I tried a few different things, but the most interesting query was this one:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt;
  Precinct_name,
  &lt;span class="pl-c1"&gt;precincts&lt;/span&gt;.&lt;span class="pl-c1"&gt;geometry&lt;/span&gt;,
  total_ballots,
  json_group_object(
    candidate_name,
    total_votes
  ) &lt;span class="pl-k"&gt;as&lt;/span&gt; votes_by_candidate
&lt;span class="pl-k"&gt;from&lt;/span&gt;
  election_results 
  &lt;span class="pl-k"&gt;join&lt;/span&gt; precincts &lt;span class="pl-k"&gt;on&lt;/span&gt; &lt;span class="pl-c1"&gt;election_results&lt;/span&gt;.&lt;span class="pl-c1"&gt;Precinct_name&lt;/span&gt; &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;precincts&lt;/span&gt;.&lt;span class="pl-c1"&gt;precinct_id&lt;/span&gt;
&lt;span class="pl-k"&gt;where&lt;/span&gt; 
  Contest_title &lt;span class="pl-k"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;Granada Community Services District Members, Board of Directors&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;group by&lt;/span&gt; 
  Precinct_name,
  &lt;span class="pl-c1"&gt;precincts&lt;/span&gt;.&lt;span class="pl-c1"&gt;geometry&lt;/span&gt;,
  total_ballots;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/joined-precincts.jpg" alt="The SQL query returned four columns: Precinct_name, geometry with a map of the precinct, total_ballots with a number and votes_by_candidate with a JSON object mapping each candidate name to their number of votes." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="creating-an-api-token-to-access-the-data"&gt;Creating an API token to access the data&lt;/h4&gt;
&lt;p&gt;I was nearly ready to hand over to Alex for the second half of our demo, where he would use Observable Notebooks to build some custom visualizations on top of the data.&lt;/p&gt;
&lt;p&gt;A great pattern for this is to host the data in Datasette and then fetch it into Observable via the Datasette JSON API.&lt;/p&gt;
&lt;p&gt;Since Datasette Cloud instances are private by default we would need to create an API token that could do this.&lt;/p&gt;
&lt;p&gt;I used this interface (from the &lt;a href="https://github.com/simonw/datasette-auth-tokens"&gt;datasette-auth-tokens plugin&lt;/a&gt;) to create a new token with read-only access to all databases and tables in the instance:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/create-api-token.jpg" alt="Create an API token interface. This token will allow API access with the same abilities as your current user, swillison .Token will be restricted to: all databases and tables: view-database, all databases and tables: view-table, all databases and tables: execute-sql - token is set to read-only and never expires, a list of possible permissions with checkboxes is listed below." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Since we're running a dedicated instance just for Datasette Public Office Hours there's no reason not to distribute that read-only token in publically accessible code.&lt;/p&gt;
&lt;h4 id="getting-cors-working"&gt;Getting CORS working&lt;/h4&gt;
&lt;p&gt;Embarrassingly, I had forgotten that we would need CORS headers in order to access the data from an Observable notebook. Thankfully we have another plugin for that: &lt;a href="https://datasette.io/plugins/datasette-cors"&gt;datasette-cors&lt;/a&gt;. I installed that quickly and we confirmed that it granted access to the API from Observable as intended.&lt;/p&gt;
&lt;p&gt;I handed over to Alex for the next section of the demo.&lt;/p&gt;
&lt;h4 id="working-with-datasette-in-observable"&gt;Working with Datasette in Observable&lt;/h4&gt;
&lt;p&gt;Alex started by running a SQL query from client-side JavaScript to pull in the joined data for our specific El Granada race:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;sql&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;`&lt;/span&gt;
&lt;span class="pl-s"&gt;select&lt;/span&gt;
&lt;span class="pl-s"&gt;  Precinct_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  precincts.geometry,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Split_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Reporting_flag,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Update_count,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Pct_Id,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Pct_seq_nbr,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Reg_voters,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Turn_Out,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Contest_Id,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Contest_seq_nbr,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Contest_title,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Contest_party_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Selectable_Options,&lt;/span&gt;
&lt;span class="pl-s"&gt;  candidate_id,&lt;/span&gt;
&lt;span class="pl-s"&gt;  candidate_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Candidate_Type,&lt;/span&gt;
&lt;span class="pl-s"&gt;  cand_seq_nbr,&lt;/span&gt;
&lt;span class="pl-s"&gt;  Party_Code,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_ballots,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_votes,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_under_votes,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_over_votes,&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote Centers_ballots],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote Centers_votes],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote Centers_under_votes],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote Centers_over_votes],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote by Mail_ballots],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote by Mail_votes],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote by Mail_under_votes],&lt;/span&gt;
&lt;span class="pl-s"&gt;  [Vote by Mail_over_votes]&lt;/span&gt;
&lt;span class="pl-s"&gt;from&lt;/span&gt;
&lt;span class="pl-s"&gt;  election_results join precincts on election_results.Precinct_name = precincts.precinct_id&lt;/span&gt;
&lt;span class="pl-s"&gt;where "Contest_title" = "Granada Community Services District Members, Board of Directors"&lt;/span&gt;
&lt;span class="pl-s"&gt;limit 101;`&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And in the next cell:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;raw_data&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-s"&gt;`https://datasette-public-office-hours.datasette.cloud/data/-/query.json?_shape=array&amp;amp;sql=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-en"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;    &lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-c1"&gt;headers&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;Authorization&lt;/span&gt;: &lt;span class="pl-s"&gt;`Bearer &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;secret&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;then&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;r&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;r&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;json&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Note the &lt;code&gt;?_shape=array&lt;/code&gt; parameter there, which causes Datasette to output the results directly as a JSON array of objects.&lt;/p&gt;
&lt;p&gt;That's all it takes to get the data into Observable. Adding another cell like this confirms that the data is now available:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Inputs&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;table&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;raw_data&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/inputs-table-raw-data.jpg" alt="An Observable cell running Inputs.table(raw_data) and displaying a table of Precinct_name and geometry columns, with GeoJSON" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="visualizing-those-with-maplibre-gl"&gt;Visualizing those with MapLibre GL&lt;/h4&gt;
&lt;p&gt;There are plenty of good options for visualizing GeoJSON data using JavaScript in an Observable notebook.&lt;/p&gt;
&lt;p&gt;Alex started with &lt;a href="https://maplibre.org/maplibre-gl-js/docs/"&gt;MapLibre GL&lt;/a&gt;, using the excellent &lt;a href="https://simonwillison.net/2024/Sep/28/openfreemap/"&gt;OpenFreeMap 3D tiles&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-s1"&gt;viewof&lt;/span&gt; &lt;span class="pl-s1"&gt;map&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-s1"&gt;const&lt;/span&gt; container &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;html&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;span class="pl-kos"&gt;&amp;lt;&lt;/span&gt;&lt;span class="pl-ent"&gt;div&lt;/span&gt; &lt;span class="pl-c1"&gt;style&lt;/span&gt;="&lt;span class="pl-s"&gt;height:800px;&lt;/span&gt;"&lt;span class="pl-kos"&gt;&amp;gt;&lt;/span&gt;`&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;yield&lt;/span&gt; &lt;span class="pl-s1"&gt;container&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;map&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;container&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;value&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;new&lt;/span&gt; &lt;span class="pl-s1"&gt;maplibregl&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;Map&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
    container&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;zoom&lt;/span&gt;: &lt;span class="pl-c1"&gt;2&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c"&gt;//style: "https://basemaps.cartocdn.com/gl/voyager-gl-style/style.json",&lt;/span&gt;
    &lt;span class="pl-c1"&gt;style&lt;/span&gt;: &lt;span class="pl-s"&gt;"https://tiles.openfreemap.org/styles/liberty"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;scrollZoom&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-k"&gt;yield&lt;/span&gt; &lt;span class="pl-s1"&gt;container&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;

  &lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;on&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"load"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;fitBounds&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d3&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;geoBounds&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;data&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt; &lt;span class="pl-c1"&gt;duration&lt;/span&gt;: &lt;span class="pl-c1"&gt;0&lt;/span&gt; &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addSource&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"precincts"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;type&lt;/span&gt;: &lt;span class="pl-s"&gt;"geojson"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;data&lt;/span&gt;: &lt;span class="pl-s1"&gt;data&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addLayer&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;id&lt;/span&gt;: &lt;span class="pl-s"&gt;"precincts"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;type&lt;/span&gt;: &lt;span class="pl-s"&gt;"fill"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;source&lt;/span&gt;: &lt;span class="pl-s"&gt;"precincts"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;paint&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
        &lt;span class="pl-s"&gt;"fill-opacity"&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.4&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
        &lt;span class="pl-s"&gt;"fill-color"&lt;/span&gt;: &lt;span class="pl-kos"&gt;[&lt;/span&gt;
          &lt;span class="pl-s"&gt;"case"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
          &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-s"&gt;"=="&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-s"&gt;"get"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"ratio"&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-c1"&gt;null&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"#000000"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
          &lt;span class="pl-kos"&gt;[&lt;/span&gt;
            &lt;span class="pl-s"&gt;"interpolate"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
            &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-s"&gt;"linear"&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
            &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-s"&gt;"get"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"ratio"&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
            &lt;span class="pl-c1"&gt;0.0&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"#0000ff"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
            &lt;span class="pl-c1"&gt;0.5&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"#d3d3d3"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
            &lt;span class="pl-c1"&gt;1.0&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"#ff0000"&lt;/span&gt;
          &lt;span class="pl-kos"&gt;]&lt;/span&gt;
        &lt;span class="pl-kos"&gt;]&lt;/span&gt;
      &lt;span class="pl-kos"&gt;}&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;on&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"click"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s"&gt;"precincts"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;e&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt; precinct&lt;span class="pl-kos"&gt;,&lt;/span&gt; ratio &lt;span class="pl-kos"&gt;}&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;e&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;features&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;0&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;properties&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
      &lt;span class="pl-k"&gt;const&lt;/span&gt; &lt;span class="pl-s1"&gt;description&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;stringify&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;

      &lt;span class="pl-k"&gt;new&lt;/span&gt; &lt;span class="pl-s1"&gt;maplibregl&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;Popup&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
        &lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;setLngLat&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;e&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;lngLat&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
        &lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;setHTML&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;description&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
        &lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;addTo&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-s1"&gt;invalidation&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;then&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;remove&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/maplibre-gl.jpg" alt="An Observable cell showing a map of El Granada - a black shape shows the outlines of the precincts." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;(This is just one of several iterations, I didn't capture detailed notes of every change Alex made to the code.)&lt;/p&gt;
&lt;h4 id="observable-plot"&gt;Observable Plot&lt;/h4&gt;
&lt;p&gt;Observable notebooks come pre-loaded with the excellent Observable Plot charting library - Mike Bostock's high-level charting tool built on top of D3.&lt;/p&gt;
&lt;p&gt;Alex used that to first render the shapes of the precincts directly, without even needing a tiled basemap:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-v"&gt;Plot&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;plot&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
  width&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;600&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;legend&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;projection&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-c1"&gt;type&lt;/span&gt;: &lt;span class="pl-s"&gt;"conic-conformal"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;parallels&lt;/span&gt;: &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;37&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;4&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-c1"&gt;60&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-c1"&gt;38&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;26&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-c1"&gt;60&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;rotate&lt;/span&gt;: &lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;120&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-c1"&gt;30&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-c1"&gt;60&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-c1"&gt;0&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;domain&lt;/span&gt;: &lt;span class="pl-s1"&gt;data&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;marks&lt;/span&gt;: &lt;span class="pl-kos"&gt;[&lt;/span&gt;
    &lt;span class="pl-v"&gt;Plot&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;geo&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;data&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;strokeOpacity&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.1&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;fill&lt;/span&gt;: &lt;span class="pl-s"&gt;"total_votes"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-en"&gt;title&lt;/span&gt;: &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;stringify&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;properties&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;tip&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
  &lt;span class="pl-kos"&gt;]&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;parallels&lt;/code&gt; and &lt;code&gt;rotate&lt;/code&gt; options there come from the handy &lt;a href="https://github.com/veltman/d3-stateplane?tab=readme-ov-file#nad83--california-zone-3-epsg26943"&gt;veltman/d3-stateplane&lt;/a&gt; repo, which lists recommended settings for the &lt;a href="https://en.wikipedia.org/wiki/State_Plane_Coordinate_System"&gt;State Plane Coordinate System&lt;/a&gt; used with projections in D3. Those values are for &lt;a href="https://www.conservation.ca.gov/cgs/rgm/state-plane-coordinate-system"&gt;California Zone 3&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/color-precincts.jpg" alt="An Observable cell shows six five distinct colored polygons, each for a different precinct. The shape of El Granada is clearly visible despite no other map tiles or labels." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4 id="bringing-it-all-together"&gt;Bringing it all together&lt;/h4&gt;
&lt;p&gt;For the grand finale, Alex combined everything learned so far to build an interactive map allowing a user to select any of the 110 races on the ballot and see a heatmap of results for any selected candidate and option:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2024/select-map.gif" alt="Animated demo. Choose a contest select - picking different contests updates the map at the bottom. For each contest the candidates or options are shown as radio buttons, and selecting those updates the map to show a heatmap of votes for that candidate in different precincts." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;You can try this out in &lt;a href="https://observablehq.com/d/2ed2ad2443d7bbb5"&gt;Alex's notebook&lt;/a&gt;. Here's the relevant code (Observable cells are divided by &lt;code&gt;// ---&lt;/code&gt; comments). Note that Observable notebooks are reactive and allow variables to be referenced out of order.&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-c"&gt;// Select the contest&lt;/span&gt;
&lt;span class="pl-s1"&gt;viewof&lt;/span&gt; &lt;span class="pl-s1"&gt;contest&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Inputs&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;select&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;contests&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt; &lt;span class="pl-c1"&gt;label&lt;/span&gt;: &lt;span class="pl-s"&gt;"Choose a contest"&lt;/span&gt; &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-c"&gt;// And the candidate&lt;/span&gt;
&lt;span class="pl-s1"&gt;viewof&lt;/span&gt;&lt;span class="pl-kos"&gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;candidate&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Inputs&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;radio&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-s1"&gt;candidates&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-c1"&gt;label&lt;/span&gt;: &lt;span class="pl-s"&gt;"Choose a candidate"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;value&lt;/span&gt;: &lt;span class="pl-s1"&gt;candidates&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;0&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-c"&gt;// Show the map itself&lt;/span&gt;
&lt;span class="pl-v"&gt;Plot&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;plot&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
  width&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;height&lt;/span&gt;: &lt;span class="pl-c1"&gt;600&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;legend&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;color&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt; &lt;span class="pl-c1"&gt;scheme&lt;/span&gt;: &lt;span class="pl-s"&gt;"blues"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-c1"&gt;legend&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt; &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;projection&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-c1"&gt;type&lt;/span&gt;: &lt;span class="pl-s"&gt;"mercator"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;domain&lt;/span&gt;: &lt;span class="pl-s1"&gt;data2&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;marks&lt;/span&gt;: &lt;span class="pl-kos"&gt;[&lt;/span&gt;
    &lt;span class="pl-v"&gt;Plot&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;geo&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;data2&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;strokeOpacity&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.1&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;fill&lt;/span&gt;: &lt;span class="pl-s"&gt;"ratio"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;tip&lt;/span&gt;: &lt;span class="pl-c1"&gt;true&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
  &lt;span class="pl-kos"&gt;]&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
# &lt;span class="pl-c1"&gt;--&lt;/span&gt;&lt;span class="pl-c1"&gt;-&lt;/span&gt;
&lt;span class="pl-s1"&gt;data2&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-c1"&gt;type&lt;/span&gt;: &lt;span class="pl-s"&gt;"FeatureCollection"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-c1"&gt;features&lt;/span&gt;: &lt;span class="pl-s1"&gt;raw_data2&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-c1"&gt;type&lt;/span&gt;: &lt;span class="pl-s"&gt;"Feature"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;properties&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;precinct&lt;/span&gt;: &lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;Precinct_name&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;total_ballots&lt;/span&gt;: &lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;total_ballots&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
      &lt;span class="pl-c1"&gt;ratio&lt;/span&gt;: &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;parse&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;votes_by_candidate&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-s1"&gt;candidate&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt; &lt;span class="pl-c1"&gt;/&lt;/span&gt; &lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;total_ballots&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-c1"&gt;geometry&lt;/span&gt;: &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;parse&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;geometry&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-s1"&gt;raw_data2&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;query&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-s"&gt;`select&lt;/span&gt;
&lt;span class="pl-s"&gt;  Precinct_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  precincts.geometry,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_ballots,&lt;/span&gt;
&lt;span class="pl-s"&gt;  json_grop_object(&lt;/span&gt;
&lt;span class="pl-s"&gt;    candidate_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;    total_votes&lt;/span&gt;
&lt;span class="pl-s"&gt;  ) as votes_by_candidate&lt;/span&gt;
&lt;span class="pl-s"&gt;from&lt;/span&gt;
&lt;span class="pl-s"&gt;  election_results &lt;/span&gt;
&lt;span class="pl-s"&gt;  join precincts on election_results.Precinct_name = precincts.precinct_id&lt;/span&gt;
&lt;span class="pl-s"&gt;where Contest_title = :contest&lt;/span&gt;
&lt;span class="pl-s"&gt;group by &lt;/span&gt;
&lt;span class="pl-s"&gt;  Precinct_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  precincts.geometry,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_ballots;`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt; contest &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-s1"&gt;raw_data2&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;query&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-s"&gt;`select&lt;/span&gt;
&lt;span class="pl-s"&gt;  Precinct_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  precincts.geometry,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_ballots,&lt;/span&gt;
&lt;span class="pl-s"&gt;  json_group_object(&lt;/span&gt;
&lt;span class="pl-s"&gt;    candidate_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;    total_votes&lt;/span&gt;
&lt;span class="pl-s"&gt;  ) as votes_by_candidate&lt;/span&gt;
&lt;span class="pl-s"&gt;from&lt;/span&gt;
&lt;span class="pl-s"&gt;  election_results &lt;/span&gt;
&lt;span class="pl-s"&gt;  join precincts on election_results.Precinct_name = precincts.precinct_id&lt;/span&gt;
&lt;span class="pl-s"&gt;where Contest_title = :contest&lt;/span&gt;
&lt;span class="pl-s"&gt;group by &lt;/span&gt;
&lt;span class="pl-s"&gt;  Precinct_name,&lt;/span&gt;
&lt;span class="pl-s"&gt;  precincts.geometry,&lt;/span&gt;
&lt;span class="pl-s"&gt;  total_ballots;`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
  &lt;span class="pl-kos"&gt;{&lt;/span&gt; contest &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-c"&gt;// Fetch the available contests&lt;/span&gt;
&lt;span class="pl-s1"&gt;contests&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;query&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;"select distinct Contest_title from election_results"&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;then&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;map&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;d&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;Contest_title&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-c"&gt;// Extract available candidates for selected contest&lt;/span&gt;

&lt;span class="pl-s1"&gt;candidates&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-v"&gt;Object&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;keys&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
  &lt;span class="pl-c1"&gt;JSON&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;parse&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;raw_data2&lt;/span&gt;&lt;span class="pl-kos"&gt;[&lt;/span&gt;&lt;span class="pl-c1"&gt;0&lt;/span&gt;&lt;span class="pl-kos"&gt;]&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;votes_by_candidate&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;
&lt;span class="pl-kos"&gt;)&lt;/span&gt;

&lt;span class="pl-c"&gt;// ---&lt;/span&gt;

&lt;span class="pl-k"&gt;function&lt;/span&gt; &lt;span class="pl-en"&gt;query&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;sql&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-s1"&gt;params&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-en"&gt;fetch&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;
    &lt;span class="pl-s"&gt;`https://datasette-public-office-hours.datasette.cloud/data/-/query.json?&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-k"&gt;new&lt;/span&gt; &lt;span class="pl-v"&gt;URLSearchParams&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;      &lt;span class="pl-kos"&gt;{&lt;/span&gt; sql&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-c1"&gt;_shape&lt;/span&gt;: &lt;span class="pl-s"&gt;"array"&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; ...&lt;span class="pl-s1"&gt;params&lt;/span&gt; &lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-s"&gt;&lt;span class="pl-s1"&gt;    &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;toString&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt;
    &lt;span class="pl-kos"&gt;{&lt;/span&gt;
      &lt;span class="pl-c1"&gt;headers&lt;/span&gt;: &lt;span class="pl-kos"&gt;{&lt;/span&gt;
        &lt;span class="pl-c1"&gt;Authorization&lt;/span&gt;: &lt;span class="pl-s"&gt;`Bearer &lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;secret&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;
      &lt;span class="pl-kos"&gt;}&lt;/span&gt;
    &lt;span class="pl-kos"&gt;}&lt;/span&gt;
  &lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;then&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;r&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-c1"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;r&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;json&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="we-ll-be-doing-this-again"&gt;We'll be doing this again&lt;/h4&gt;
&lt;p&gt;This was our first time trying something like this and I think it worked &lt;em&gt;really&lt;/em&gt; well. We're already thinking about ways to improve it next time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I want to record these sessions and make them available on YouTube for people who couldn't be there live&lt;/li&gt;
&lt;li&gt;It would be fun to mix up the format. I'm particularly keen on getting more people involved giving demos - maybe having 5-10 minute lightning demo slots so we can see what other people are working on&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Keep an eye on this blog or on the &lt;a href="https://datasette.io/discord"&gt;Datasette Discord&lt;/a&gt; for news about future sessions.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mapping"&gt;mapping&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-cloud"&gt;datasette-cloud&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/alex-garcia"&gt;alex-garcia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-public-office-hours"&gt;datasette-public-office-hours&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/leaflet"&gt;leaflet&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="geospatial"/><category term="mapping"/><category term="politics"/><category term="projects"/><category term="datasette"/><category term="datasette-cloud"/><category term="alex-garcia"/><category term="datasette-public-office-hours"/><category term="leaflet"/></entry><entry><title>Quoting NY Times Editorial Board</title><link href="https://simonwillison.net/2024/Nov/5/ny-times-editorial-board/#atom-tag" rel="alternate"/><published>2024-11-05T01:33:23+00:00</published><updated>2024-11-05T01:33:23+00:00</updated><id>https://simonwillison.net/2024/Nov/5/ny-times-editorial-board/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.nytimes.com/interactive/2024/11/02/opinion/vote-harris-2024-election.html"&gt;&lt;p&gt;You already &lt;a href="https://www.nytimes.com/interactive/2024/07/18/opinion/trump-presidency-record.html"&gt;know&lt;/a&gt; Donald Trump. He is &lt;a href="https://www.nytimes.com/interactive/2024/07/11/opinion/editorials/donald-trump-2024-unfit.html"&gt;unfit&lt;/a&gt; to lead. &lt;a href="https://www.nytimes.com/2024/10/22/opinion/donald-trump-ezra-klein-podcast.html"&gt;Watch him&lt;/a&gt;. Listen to &lt;a href="https://www.nytimes.com/interactive/2024/09/26/opinion/donald-trump-personality-history.html"&gt;those&lt;/a&gt; who know him &lt;a href="https://www.nytimes.com/interactive/2024/01/18/opinion/trump-cabinet-election-2024.html"&gt;best&lt;/a&gt;. He tried to &lt;a href="https://www.nytimes.com/2023/08/02/opinion/trump-jan-6-indictment.html"&gt;subvert&lt;/a&gt; an election and remains a &lt;a href="https://www.nytimes.com/2024/01/06/opinion/trump-2024-campaign-warning.html"&gt;threat&lt;/a&gt; to democracy. He helped overturn Roe, with terrible &lt;a href="https://www.nytimes.com/interactive/2024/10/17/opinion/dobbs-roe-abortion-stories.html"&gt;consequences&lt;/a&gt;. Mr. Trump's &lt;a href="https://www.nytimes.com/2024/04/10/opinion/trump-presidency-corruption.html"&gt;corruption&lt;/a&gt; and &lt;a href="https://www.nytimes.com/interactive/2022/11/17/opinion/trump-five-major-investigations-dozens-of-ways-out.html"&gt;lawlessness&lt;/a&gt; go &lt;a href="https://www.nytimes.com/2024/04/17/opinion/donald-trump-trial.html"&gt;beyond&lt;/a&gt; &lt;a href="https://www.nytimes.com/2023/08/15/opinion/editorials/trump-indictment-republicans.html"&gt;elections&lt;/a&gt;: It's his whole &lt;a href="https://www.nytimes.com/2024/03/25/opinion/trump-al-capone.html"&gt;ethos&lt;/a&gt;. He &lt;a href="https://www.nytimes.com/2024/07/24/opinion/trump-lies-charts-data.html"&gt;lies&lt;/a&gt; without &lt;a href="https://www.nytimes.com/interactive/2024/10/25/opinion/what-trump-says.html"&gt;limit&lt;/a&gt;. If he's re-elected, the G.O.P. won't &lt;a href="https://www.nytimes.com/2024/07/13/opinion/trump-2024-election.html"&gt;restrain&lt;/a&gt; him. Mr. Trump will use the &lt;a href="https://www.nytimes.com/2024/01/12/opinion/donald-trump-culture-decline.html"&gt;government&lt;/a&gt; to go after &lt;a href="https://www.nytimes.com/2024/03/08/opinion/trump-2025-second-term.html?"&gt;opponents&lt;/a&gt;. He will pursue a &lt;a href="https://www.nytimes.com/2024/09/14/opinion/trump-debate-haitians-pets.html"&gt;cruel&lt;/a&gt; policy of mass &lt;a href="https://www.nytimes.com/2024/10/18/opinion/trump-woodward-milley-mass-deportation.html"&gt;deportations&lt;/a&gt;. He will wreak havoc on the &lt;a href="https://www.nytimes.com/2024/09/09/opinion/harris-trump-debate-issues.html"&gt;poor&lt;/a&gt;, the &lt;a href="https://www.nytimes.com/2024/07/31/opinion/trump-threat-treasury-market.html"&gt;middle class&lt;/a&gt; and &lt;a href="https://www.nytimes.com/2024/06/23/opinion/ceo-trump-republican-support.html"&gt;employers&lt;/a&gt;. Another Trump term will damage the &lt;a href="https://www.nytimes.com/2024/04/20/opinion/trump-biden-climate-election.html"&gt;climate&lt;/a&gt;, shatter &lt;a href="https://www.nytimes.com/2024/09/17/opinion/trump-harris-foreign-policy.html"&gt;alliances&lt;/a&gt; and strengthen &lt;a href="https://www.nytimes.com/2024/05/21/opinion/trump-foreign-policy.html"&gt;autocrats&lt;/a&gt;. Americans should demand &lt;a href="https://www.nytimes.com/2024/09/30/opinion/editorials/kamala-harris-2024-endorsement.html"&gt;better&lt;/a&gt;. Vote.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.nytimes.com/interactive/2024/11/02/opinion/vote-harris-2024-election.html"&gt;NY Times Editorial Board&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/new-york-times"&gt;new-york-times&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;&lt;/p&gt;



</summary><category term="new-york-times"/><category term="politics"/></entry><entry><title>Quoting Question for Department for Science, Innovation and Technology</title><link href="https://simonwillison.net/2024/Nov/1/prompt-injection/#atom-tag" rel="alternate"/><published>2024-11-01T15:14:53+00:00</published><updated>2024-11-01T15:14:53+00:00</updated><id>https://simonwillison.net/2024/Nov/1/prompt-injection/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://questions-statements.parliament.uk/written-questions/detail/2024-10-14/HL1541/"&gt;&lt;p&gt;&lt;strong&gt;Lord Clement-Jones&lt;/strong&gt;: To ask His Majesty's Government what assessment they have made of the cybersecurity risks posed by prompt injection attacks to the processing by generative artificial intelligence of material provided from outside government, and whether any such attacks have been detected thus far.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lord Vallance of Balham&lt;/strong&gt;: Security is central to HMG's &lt;a href="https://www.gov.uk/government/publications/generative-ai-framework-for-hmg/generative-ai-framework-for-hmg-html"&gt;Generative AI Framework&lt;/a&gt;, which was published in January this year and sets out principles for using generative AI safely and responsibly. The risks posed by prompt injection attacks, including from material provided outside of government, have been assessed as part of this framework and are continually reviewed. The published Generative AI Framework for HMG specifically includes Prompt Injection attacks, alongside other AI specific cyber risks.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://questions-statements.parliament.uk/written-questions/detail/2024-10-14/HL1541/"&gt;Question for Department for Science, Innovation and Technology&lt;/a&gt;, UIN HL1541, tabled on 14 Oct 2024&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uk"&gt;uk&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-injection"&gt;prompt-injection&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="politics"/><category term="security"/><category term="uk"/><category term="ai"/><category term="prompt-injection"/><category term="generative-ai"/><category term="llms"/></entry><entry><title>Follow the Crypto</title><link href="https://simonwillison.net/2024/Jul/15/follow-the-crypto/#atom-tag" rel="alternate"/><published>2024-07-15T22:06:29+00:00</published><updated>2024-07-15T22:06:29+00:00</updated><id>https://simonwillison.net/2024/Jul/15/follow-the-crypto/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.followthecrypto.org/"&gt;Follow the Crypto&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Very smart new site from Molly White tracking the huge increase in activity from Cryptocurrency-focused PACs this year. These PACs have already raised $203 million and spent $38 million influencing US elections in 2024.&lt;/p&gt;
&lt;p&gt;Right now &lt;a href="https://www.followthecrypto.org/committees/ranking/super"&gt;Molly's rankings show&lt;/a&gt; that the "Fairshake" cryptocurrency PAC is second only to the Trump-supporting "Make America Great Again Inc" in money raised by Super PACs this year - though it's 9th in &lt;a href="https://www.followthecrypto.org/committees/ranking/all"&gt;the list that includes other types of PAC&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Molly's data comes from the FEC, and the code behind the site &lt;a href="https://github.com/molly/follow-the-crypto"&gt;is all open source&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There's lots more about the project in the latest edition of &lt;a href="https://www.citationneeded.news/follow-the-crypto/"&gt;Molly's newsletter&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Did you know that the cryptocurrency industry has spent more on 2024 elections in the United States than the oil industry? More than the pharmaceutical industry?&lt;/p&gt;
&lt;p&gt;In fact, the cryptocurrency industry has spent more on 2024 elections than the entire energy sector &lt;em&gt;and&lt;/em&gt; the entire health sector. Those industries, both worth hundreds of billions or trillions of dollars, are being outspent by an industry that, even by generous estimates, is worth less than $20 billion.&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://www.citationneeded.news/follow-the-crypto/"&gt;Citation Needed&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-journalism"&gt;data-journalism&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/elections"&gt;elections&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/blockchain"&gt;blockchain&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/molly-white"&gt;molly-white&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-journalism"/><category term="elections"/><category term="politics"/><category term="blockchain"/><category term="molly-white"/></entry><entry><title>Quoting Private Eye</title><link href="https://simonwillison.net/2024/Jul/8/private-eye/#atom-tag" rel="alternate"/><published>2024-07-08T15:20:15+00:00</published><updated>2024-07-08T15:20:15+00:00</updated><id>https://simonwillison.net/2024/Jul/8/private-eye/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/PrivateEyeNews/status/1810327043827249452"&gt;&lt;p&gt;Voters in the Clapham and Brixton Hill constituency can rest easy - despite appearances, their Reform candidate Mark Matlock really does exist. [...] Matlock - based in the South Cotswolds, some 100 miles from the constituency in which he is standing - confirmed: "I am a real person." Although his campaign image is Al-generated, he said this was for lack of a real photo of him wearing a tie in Reform's trademark turquoise.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/PrivateEyeNews/status/1810327043827249452"&gt;Private Eye&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;&lt;/p&gt;



</summary><category term="politics"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>Quoting UK House of Lords report on Generative AI</title><link href="https://simonwillison.net/2024/Feb/2/uk-house-of-lords-report-on-generative-ai/#atom-tag" rel="alternate"/><published>2024-02-02T03:54:21+00:00</published><updated>2024-02-02T03:54:21+00:00</updated><id>https://simonwillison.net/2024/Feb/2/uk-house-of-lords-report-on-generative-ai/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://committees.parliament.uk/publications/43172/documents/214762/default/"&gt;&lt;p&gt;LLMs may offer immense value to society. But that does not warrant the violation of copyright law or its underpinning principles. We do not believe it is fair for tech firms to use rightsholder data for commercial purposes without permission or compensation, and to gain vast financial rewards in the process. There is compelling evidence that the UK benefits economically, politically and societally from upholding a globally respected copyright regime.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://committees.parliament.uk/publications/43172/documents/214762/default/"&gt;UK House of Lords report on Generative AI&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/law"&gt;law&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;&lt;/p&gt;



</summary><category term="ethics"/><category term="law"/><category term="politics"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="ai-ethics"/></entry><entry><title>Quoting OpenAI to the Lords Select Committee on LLMs</title><link href="https://simonwillison.net/2024/Jan/8/openai-to-the-lords-select-committee-on-llms/#atom-tag" rel="alternate"/><published>2024-01-08T17:33:31+00:00</published><updated>2024-01-08T17:33:31+00:00</updated><id>https://simonwillison.net/2024/Jan/8/openai-to-the-lords-select-committee-on-llms/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://committees.parliament.uk/writtenevidence/126981/pdf/"&gt;&lt;p&gt;We believe that AI tools are at their best when they incorporate and represent the full diversity and breadth of human intelligence and experience. [...] Because copyright today covers virtually every sort of human expression– including blog posts, photographs, forum posts, scraps of software code, and government documents–it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://committees.parliament.uk/writtenevidence/126981/pdf/"&gt;OpenAI to the Lords Select Committee on LLMs&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/copyright"&gt;copyright&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;&lt;/p&gt;



</summary><category term="copyright"/><category term="politics"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="training-data"/></entry><entry><title>Quoting Arthur Mensch</title><link href="https://simonwillison.net/2023/Nov/16/arthur-mensch-mistral-ai/#atom-tag" rel="alternate"/><published>2023-11-16T11:29:57+00:00</published><updated>2023-11-16T11:29:57+00:00</updated><id>https://simonwillison.net/2023/Nov/16/arthur-mensch-mistral-ai/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://twitter.com/arthurmensch/status/1725076260827566562"&gt;&lt;p&gt;The EU AI Act now proposes to regulate “foundational models”, i.e. the engine behind some AI applications. We cannot regulate an engine devoid of usage. We don’t regulate the C language because one can use it to develop malware. Instead, we ban malware and strengthen network systems (we regulate usage). Foundational language models provide a higher level of abstraction than the C language for programming computer systems; nothing in their behaviour justifies a change in the regulatory framework.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://twitter.com/arthurmensch/status/1725076260827566562"&gt;Arthur Mensch&lt;/a&gt;, Mistral AI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mistral"&gt;mistral&lt;/a&gt;&lt;/p&gt;



</summary><category term="politics"/><category term="ai"/><category term="llms"/><category term="mistral"/></entry><entry><title>I Lived Through A Stupid Coup. America Is Having One Now</title><link href="https://simonwillison.net/2020/Nov/21/i-lived-through-stupid-coup-america-having-one-now/#atom-tag" rel="alternate"/><published>2020-11-21T13:21:50+00:00</published><updated>2020-11-21T13:21:50+00:00</updated><id>https://simonwillison.net/2020/Nov/21/i-lived-through-stupid-coup-america-having-one-now/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://medium.com/indica/i-lived-through-a-coup-america-is-having-one-now-437934b1dac3"&gt;I Lived Through A Stupid Coup. America Is Having One Now&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
If, like me, you have been avoiding the word “coup” since it feels like a clear over-reaction to what’s going on, I challenge you to read this piece and not change your mind.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1330130649546711040"&gt;Harper Reed&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/trump"&gt;trump&lt;/a&gt;&lt;/p&gt;



</summary><category term="politics"/><category term="trump"/></entry><entry><title>How much can you learn from just two columns?</title><link href="https://simonwillison.net/2020/Jun/15/how-much-can-you-learn-just-two-columns/#atom-tag" rel="alternate"/><published>2020-06-15T18:46:57+00:00</published><updated>2020-06-15T18:46:57+00:00</updated><id>https://simonwillison.net/2020/Jun/15/how-much-can-you-learn-just-two-columns/#atom-tag</id><summary type="html">
    &lt;p&gt;Derek Willis &lt;a href="https://twitter.com/derekwillis/status/1272544740718915585"&gt;shared&lt;/a&gt; an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.&lt;/p&gt;

&lt;p&gt;He &lt;a href="https://official-gop-following.herokuapp.com/following/following"&gt;published it here&lt;/a&gt; using Datasette. It's a single table containing 385,979 rows - each row is a &lt;code&gt;username&lt;/code&gt;, &lt;code&gt;account_name&lt;/code&gt; pair, where &lt;code&gt;username&lt;/code&gt; is the Twitter account that is being followed and &lt;code&gt;account_name&lt;/code&gt; is the congressional Twitter account that's following it.&lt;/p&gt;

&lt;p&gt;Here's some sample data:&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-username" scope="col"&gt;username&lt;/th&gt;&lt;th class="col-account_name" scope="col"&gt;account_name&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;njhotline&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;SenatorTimScott&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;emilykpierce&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;RobWittman&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;jessblevinsoh&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;OHPressSec&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;familylink&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;SenMikeLee&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;howardsnowdon&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;RepThomasMassie&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;pattidomm&lt;/td&gt;
            
                &lt;td class="col-account_name"&gt;SenCapito&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;How much can we learn from just these two columns?&lt;/p&gt;

&lt;h4&gt;Which accounts have the most GOP congressional followers?&lt;/h4&gt;

&lt;p&gt;Let's start with a simple aggregation: which accounts on Twitter have the most GOP congressional followers?&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  username,
  count(*) as num_gop_followers
from
  following
group by
  username
order by
  num_gop_followers desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All we're doing here is counting the number of times a unique &lt;code&gt;username&lt;/code&gt; (an account that is being followed) shows up in our table, then sorting by those counts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++username%2C%0D%0A++count%28*%29+as+num_gop_followers%0D%0Afrom%0D%0A++following%0D%0Agroup+by%0D%0A++username%0D%0Aorder+by%0D%0A++num_gop_followers+desc"&gt;Here are the result&lt;/a&gt;. The top ten are:&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-username" scope="col"&gt;username&lt;/th&gt;&lt;th class="col-num_gop_followers" scope="col"&gt;num_gop_followers&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;housegop&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;231&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;gopleader&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;229&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;realdonaldtrump&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;219&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;vp&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;216&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;speakerryan&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;207&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;whitehouse&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;207&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;stevescalise&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;198&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;chadpergram&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;195&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;potus&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;195&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;foxnews&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;187&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4&gt;Adding a "view more" link&lt;/h4&gt;

&lt;p&gt;Wouldn't it be useful if you could see which accounts those 231 followers of @housegop were?&lt;/p&gt;

&lt;p&gt;We can do that in Datasette without a SQL query - we can instead use the form on the table page to construct a filter - or construct a querystring URL directly. Here are the 5 GOP congressional accounts following  @cityofdallas:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=cityofdallas"&gt;https://official-gop-following.herokuapp.com/following/following?username=cityofdallas&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's add that link to our original top-followed query. Datasette automatically links any value that begins with &lt;code&gt;https://&lt;/code&gt;, so we can use SQL concatenation trick (with the &lt;code&gt;||&lt;/code&gt; concatenation operator) to construct that URL as part of the query:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;
select
  username,
  count(*) as num_gop_followers,
  'https://official-gop-following.herokuapp.com/following/following?username=' || username as list_of_gop_followers
from
  following
group by
  username
order by
  num_gop_followers desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++username%2C%0D%0A++count%28*%29+as+num_gop_followers%2C%0D%0A++%27https%3A%2F%2Fofficial-gop-following.herokuapp.com%2Ffollowing%2Ffollowing%3Fusername%3D%27+%7C%7C+username+as+list_of_gop_followers%0D%0Afrom%0D%0A++following%0D%0Agroup+by%0D%0A++username%0D%0Aorder+by%0D%0A++num_gop_followers+desc"&gt;Here's that query&lt;/a&gt;. The first five rows look like this:&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-username" scope="col"&gt;username&lt;/th&gt;&lt;th class="col-num_gop_followers" scope="col"&gt;num_gop_followers&lt;/th&gt;&lt;th class="col-list_of_gop_followers" scope="col"&gt;list_of_gop_followers&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;housegop&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;231&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=housegop"&gt;https://official-gop-following.herokuapp.com/following/following?username=housegop&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;gopleader&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;229&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=gopleader"&gt;https://official-gop-following.herokuapp.com/following/following?username=gopleader&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;realdonaldtrump&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;219&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=realdonaldtrump"&gt;https://official-gop-following.herokuapp.com/following/following?username=realdonaldtrump&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;vp&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;216&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=vp"&gt;https://official-gop-following.herokuapp.com/following/following?username=vp&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-username"&gt;speakerryan&lt;/td&gt;
            
                &lt;td class="col-num_gop_followers"&gt;207&lt;/td&gt;
            
                &lt;td class="col-list_of_gop_followers"&gt;&lt;a href="https://official-gop-following.herokuapp.com/following/following?username=speakerryan"&gt;https://official-gop-following.herokuapp.com/following/following?username=speakerryan&lt;/a&gt;&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4&gt;Congressional accounts who aren't following certain accounts&lt;/h4&gt;

&lt;p&gt;Since there are only &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select+count%28distinct+account_name%29+from+following"&gt;279&lt;/a&gt; congressional GOP Twitter accounts, how about seeing who are the 279 - 219 = 60 accounts that aren't following @realdonaldtrump?&lt;/p&gt;

&lt;p&gt;Let's construct a SQL query for this, using a sub-select:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  distinct account_name
from
  following
where
  account_name not in (
    select
      account_name
    from
      following
    where
      username = 'realdonaldtrump'
  )&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href="https://official-gop-following.herokuapp.com/following?sql=select+distinct+account_name%0D%0Afrom%0D%0A++following%0D%0Awhere+account_name+not+in+%28select+account_name+from+following+where+username+%3D+%27realdonaldtrump%27%29%0D%0A"&gt;Here that is in Datasette&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A neat thing we can do here is to parametrize that query. We can swap the hard-coded &lt;code&gt;'realdonaldtrump'&lt;/code&gt; value for a named parameter, &lt;code&gt;:name&lt;/code&gt;, instead:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
    distinct account_name
  from
    following
  where
    account_name not in (
      select
        account_name
      from
        following
      where
        username = :name
    )&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now when we &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++++distinct+account_name%0D%0A++from%0D%0A++++following%0D%0A++where%0D%0A++++account_name+not+in+%28%0D%0A++++++select%0D%0A++++++++account_name%0D%0A++++++from%0D%0A++++++++following%0D%0A++++++where%0D%0A++++++++username+%3D+%3Aname%0D%0A++++%29"&gt;visit that in Datasette&lt;/a&gt; it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2020/datasette-form.png" alt="Screenshot of the SQL for in Datasette, showing a name input value" style="max-width: 100%" /&gt;&lt;/p&gt;

&lt;p&gt;We can &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++++distinct+account_name%0D%0A++from%0D%0A++++following%0D%0A++where%0D%0A++++account_name+not+in+%28%0D%0A++++++select%0D%0A++++++++account_name%0D%0A++++++from%0D%0A++++++++following%0D%0A++++++where%0D%0A++++++++username+%3D+%3Aname%0D%0A++++%29&amp;amp;name=realdonaldtrump"&gt;add ?name=realdonaldtrump&lt;/a&gt; to the URL (or submit the form and save the resulting URL) to link to results for one individual.&lt;/p&gt;

&lt;p&gt;We've essentially created a new mini-application here - complete with an input form - just by bookmarking a URL in Datasette.&lt;/p&gt;

&lt;p&gt;Let's make the query a bit more interesting by including a count of the number of accounts those congress-people ARE following, and sorting by that.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  account_name,
  count(*) as num_accounts_they_follow
from
  following
where
  account_name not in (
    select
      account_name
    from
      following
    where
      username = 'realdonaldtrump'
  )
group by
  account_name
order by
  num_accounts_they_follow desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here are &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++account_name%2C%0D%0A++count%28*%29+as+num_accounts_they_follow%0D%0Afrom%0D%0A++following%0D%0Awhere%0D%0A++account_name+not+in+%28%0D%0A++++select%0D%0A++++++account_name%0D%0A++++from%0D%0A++++++following%0D%0A++++where%0D%0A++++++username+%3D+%3Aname%0D%0A++%29%0D%0Agroup+by%0D%0A++account_name%0D%0Aorder+by%0D%0A++num_accounts_they_follow+desc&amp;amp;name=realdonaldtrump"&gt;the results&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th class="col-account_name" scope="col"&gt;account_name&lt;/th&gt;&lt;th class="col-num_accounts_they_follow" scope="col"&gt;num_accounts_they_follow&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;ChuckGrassley&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;13475&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;VernBuchanan&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;8560&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;CynthiaLummis&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;5793&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;GovAbbott&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;4423&lt;/td&gt;
            
        &lt;/tr&gt;
    
        &lt;tr&gt;
            
                &lt;td class="col-account_name"&gt;SenatorTimScott&lt;/td&gt;
            
                &lt;td class="col-num_accounts_they_follow"&gt;3846&lt;/td&gt;
            
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;@ChuckGrassley follows 13,475 accounts but none of them are the president!&lt;/p&gt;

&lt;h4&gt;Most similar accounts, based on number of shared follows&lt;/h4&gt;

&lt;p&gt;One last query. This time we're going to look at which accounts are "most similar" to each other, based on the largest overlap of follows. Here's the SQL for that:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;select
  :name as representative,
  account_name as similar_representative,
  count(*) as num_shared_follows
from
  following
where
  username in (
    select
      username
    from
      following
    where
      account_name = :name
  )
  and account_name != :name
group by
  account_name
order by
  num_shared_follows desc&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Again, we're using a &lt;code&gt;:name&lt;/code&gt; placeholder. Here are the congressional accounts that are &lt;a href="https://official-gop-following.herokuapp.com/following?sql=select%0D%0A++%3Aname+as+representative%2C%0D%0A++account_name+as+similar_representative%2C%0D%0A++count%28*%29+as+num_shared_follows%0D%0Afrom%0D%0A++following%0D%0Awhere%0D%0A++username+in+%28%0D%0A++++select%0D%0A++++++username%0D%0A++++from%0D%0A++++++following%0D%0A++++where%0D%0A++++++account_name+%3D+%3Aname%0D%0A++%29%0D%0A++and+account_name+%21%3D+%3Aname%0D%0Agroup+by%0D%0A++account_name%0D%0Aorder+by%0D%0A++num_shared_follows+desc&amp;amp;name=MikeKellyPA"&gt;most similar to @MikeKellyPA&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;What else can you do?&lt;/h4&gt;

&lt;p&gt;I'm pretty impressed at how much insight can be gained using SQL against just a two column table.&lt;/p&gt;

&lt;p&gt;This post started as a Twitter thread. Charles Arthur &lt;a href="https://twitter.com/charlesarthur/status/1272548372885504001"&gt;suggested&lt;/a&gt; cross-referencing this against other sources such as &lt;a href="https://www.govtrack.us/about/analysis#ideology"&gt;the GovTrack ideology analysis&lt;/a&gt; of congressional candidates. This is a great idea! It's also very feasible, given that much of the data underlying GovTrack is &lt;a href="https://github.com/unitedstates/congress-legislators"&gt;available on GitHub&lt;/a&gt;. Import that into Datasette alongside Derek's follower data and you could construct some very interesting SQL joins indeed.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/derek-willis"&gt;derek-willis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sql"&gt;sql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="derek-willis"/><category term="politics"/><category term="sql"/><category term="twitter"/><category term="datasette"/></entry><entry><title>Quoting Maciej Cegłowski</title><link href="https://simonwillison.net/2019/May/30/maciej-ceglowski/#atom-tag" rel="alternate"/><published>2019-05-30T12:03:57+00:00</published><updated>2019-05-30T12:03:57+00:00</updated><id>https://simonwillison.net/2019/May/30/maciej-ceglowski/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://idlewords.com/2019/05/what_i_learned_trying_to_secure_congressional_campaigns.htm"&gt;&lt;p&gt;Practical campaign security is a wood chipper for your hopes and dreams. It sits at the intersection of 19 kinds of status quo, each more odious than the last. You have to accept the fact that computers are broken, software is terrible, campaign finance is evil, the political parties are inept, the DCCC exists, politics is full of parasites, tech companies are run by arrogant man-children, and so on.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://idlewords.com/2019/05/what_i_learned_trying_to_secure_congressional_campaigns.htm"&gt;Maciej Cegłowski&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/maciej-ceglowski"&gt;maciej-ceglowski&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;&lt;/p&gt;



</summary><category term="maciej-ceglowski"/><category term="politics"/><category term="security"/></entry><entry><title>What the Hell is Going On?</title><link href="https://simonwillison.net/2019/Mar/17/what-the-hell-is-going-on/#atom-tag" rel="alternate"/><published>2019-03-17T16:50:04+00:00</published><updated>2019-03-17T16:50:04+00:00</updated><id>https://simonwillison.net/2019/Mar/17/what-the-hell-is-going-on/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.perell.com/blog/what-the-hell-is-going-on"&gt;What the Hell is Going On?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
David Perell discusses how the shift from information scarcity to information abundance is reshaping commerce, education, and politics. Long but worthwhile.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/patio11/status/1107269333040824320"&gt;@patio11&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/education"&gt;education&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/internet"&gt;internet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;&lt;/p&gt;



</summary><category term="education"/><category term="internet"/><category term="politics"/></entry><entry><title>Analyzing US Election Russian Facebook Ads</title><link href="https://simonwillison.net/2018/Aug/6/russian-facebook-ads/#atom-tag" rel="alternate"/><published>2018-08-06T16:01:18+00:00</published><updated>2018-08-06T16:01:18+00:00</updated><id>https://simonwillison.net/2018/Aug/6/russian-facebook-ads/#atom-tag</id><summary type="html">
    &lt;p&gt;Two interesting data sources have emerged in the past few weeks concerning the Russian impact on the 2016 US elections.&lt;/p&gt;
&lt;p&gt;FiveThirtyEight &lt;a href="https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/"&gt;published nearly 3 million tweets&lt;/a&gt; from accounts associated with the Russian “Internet Research Agency” - see &lt;a href="https://simonwillison.net/2018/Aug/6/troll-tweets/"&gt;my article and searchable tweet archive here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Separately, the House Intelligence Committee Minority &lt;a href="https://democrats-intelligence.house.gov/social-media-content/"&gt;released 3,517 Facebook ads&lt;/a&gt; that were reported to have been bought by the Russian Internet Research Agency as a set of redacted PDF files.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Exploring_the_Russian_Facebook_Ad_spend_18"&gt;&lt;/a&gt;Exploring the Russian Facebook Ad spend&lt;/h3&gt;
&lt;p&gt;The initial data was released as &lt;a href="https://democrats-intelligence.house.gov/social-media-content/social-media-advertisements.htm"&gt;zip files full of PDFs&lt;/a&gt;, one of the least friendly formats you can use to publish data.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://twitter.com/edsu"&gt;Ed Summers&lt;/a&gt; took on the intimidating task of cleaning that up. &lt;a href="https://github.com/edsu/irads"&gt;His results are incredible&lt;/a&gt;: he used the &lt;a href="https://pypi.org/project/pytesseract/"&gt;pytesseract OCR library&lt;/a&gt; and &lt;a href="https://pypi.org/project/PyPDF2/"&gt;PyPDF2&lt;/a&gt; to extract both the images and the associated metadata and convert the whole lot into a single 3.9MB JSON file.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://github.com/simonw/russian-ira-facebook-ads-datasette"&gt;wrote some code&lt;/a&gt; to convert his JSON file to SQLite (more on the details later) and the result can be found here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://russian-ira-facebook-ads.datasettes.com/"&gt;https://russian-ira-facebook-ads.datasettes.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here’s an &lt;a href="https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/display_ads?_search=cops&amp;amp;_sort_desc=spend_usd"&gt;example search for “cops” ordered by the USD equivalent spent on the ad&lt;/a&gt; (some of the spends are in rubles, so I convert those to USD using today’s exchange rate of 0.016).&lt;/p&gt;
&lt;p&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/ads-cops-sorted-by-usd.png" alt="Search ads for cops, order by USD descending" /&gt;&lt;/p&gt;
&lt;p&gt;One of the most interesting things about this data is that it includes the Facebook ad targetting options that were used to promote the ads. I’ve built a separate interface for browsing those - you can see &lt;a href="https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/top_targets"&gt;the most frequently applied targets&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/top-targets.png" alt="Top targets" /&gt;&lt;/p&gt;
&lt;p&gt;And by browsing &lt;a href="https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/faceted-targets?targets=%5B%22d6ade%22%5D"&gt;through the different facets&lt;/a&gt; you can construct e.g. a search for all ads that targeted people interested in both &lt;code&gt;interests:Martin Luther King&lt;/code&gt; and  &lt;code&gt;interests:Police Brutality is a Crime&lt;/code&gt;: &lt;a href="https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/display_ads?_targets_json=%5B%22d6ade%22%2C%2240c27%22%5D"&gt;https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/display_ads?_targets_json=[&amp;quot;d6ade&amp;quot;%2C&amp;quot;40c27&amp;quot;]&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;&lt;a id="New_tooling_under_the_hood_40"&gt;&lt;/a&gt;New tooling under the hood&lt;/h3&gt;
&lt;p&gt;I ended up spinning up several new projects to help process and explore this data.&lt;/p&gt;
&lt;h4&gt;&lt;a id="sqliteutils_44"&gt;&lt;/a&gt;sqlite-utils&lt;/h4&gt;
&lt;p&gt;The first is a new library called &lt;a href="https://sqlite-utils.readthedocs.io/en/latest/"&gt;sqlite-utils&lt;/a&gt;. If data is already in CSV I tend to convert it using csvs-to-sqlite, but if data is in a less tabular format (JSON or XML for example) I have to hand-write code. Here’s &lt;a href="https://github.com/simonw/register-of-members-interests/blob/2baf75956b8b9e93a3985ebeb2259f7f2af760c8/convert_xml_to_sqlite.py"&gt;a script&lt;/a&gt; I wrote to process the XML version of &lt;a href="https://simonwillison.net/2018/Apr/25/register-members-interests/"&gt;the UK Register of Members Interests for example&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My goal with sqlite-utils is to take some of the common patterns from those scripts and make them as easy to use as possible, in particular when running inside a Jupyter notebook. It’s still very early, but &lt;a href="https://github.com/simonw/russian-ira-facebook-ads-datasette/blob/336ba87ef8071e664441ad0a95e3b8d0a33f682a/fetch_and_build_russian_ads.py"&gt;the script I wrote&lt;/a&gt; to process the Russian ads JSON is a good example of the kind of thing I want to do with it.&lt;/p&gt;
&lt;h4&gt;&lt;a id="datasettejsonhtml_50"&gt;&lt;/a&gt;datasette-json-html&lt;/h4&gt;
&lt;p&gt;The second new tool is a new Datasette plugin (and &lt;a href="https://github.com/simonw/datasette/issues/352"&gt;corresponding plugin hook&lt;/a&gt;) called &lt;a href="https://github.com/simonw/datasette-json-html"&gt;datasette-json-html&lt;/a&gt;. I used this to solve the need to display both rendered images and customized links as part of the regular Datasette instance.&lt;/p&gt;
&lt;p&gt;It’s a pretty crazy solution (hence why it’s implemented as a plugin and not part of Datasette core) but it works surprisingly well. The basic idea is to support a mini JSON language which can be detected and rendered as HTML. A couple of examples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  &amp;quot;img_src&amp;quot;: &amp;quot;https://raw.githubusercontent.com/edsu/irads/03fb4b/site/images/0771.png&amp;quot;,
  &amp;quot;width&amp;quot;: 200
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Is rendered as an HTML &lt;code&gt;&amp;lt;img src=&amp;quot;&amp;quot;&amp;gt;&lt;/code&gt; element.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[
  {
    &amp;quot;label&amp;quot;: &amp;quot;location:United States&amp;quot;,
    &amp;quot;href&amp;quot;: &amp;quot;/russian-ads/display_ads?_target=ec3ac&amp;quot;
  },
  {
    &amp;quot;label&amp;quot;: &amp;quot;interests:Martin Luther King&amp;quot;,
    &amp;quot;href&amp;quot;: &amp;quot;/russian-ads/display_ads?_target=d6ade&amp;quot;
  },
  {
    &amp;quot;label&amp;quot;: &amp;quot;interests:Jr.&amp;quot;,
    &amp;quot;href&amp;quot;: &amp;quot;/russian-ads/display_ads?_target=8e7b3&amp;quot;
  }
]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Is rendered as a comma-separated list of HTML links.&lt;/p&gt;
&lt;p&gt;Why use JSON for this? Because SQLite has some &lt;a href="https://www.sqlite.org/json1.html"&gt;incredibly powerful JSON features&lt;/a&gt;, making it trivial to output JSON as part of the result of a SQL query. Most interestingly of all it has &lt;code&gt;json_group_array()&lt;/code&gt; which can work as an aggregation function to combine a set of related rows into a single JSON array.&lt;/p&gt;
&lt;p&gt;The display_ads page shown above is powered by a SQL view. Here’s the relevant subset of that view:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select ads.id,
    case when image is not null then
        json_object(&amp;quot;img_src&amp;quot;, &amp;quot;https://raw.githubusercontent.com/edsu/irads/03fb4b/site/&amp;quot; || image, &amp;quot;width&amp;quot;, 200)
    else
        &amp;quot;no image&amp;quot;
    end as img,
    json_group_array(
        json_object(
            &amp;quot;label&amp;quot;, targets.name,
            &amp;quot;href&amp;quot;, &amp;quot;/russian-ads/display_ads?_target=&amp;quot;
                || urllib_quote_plus(targets.id)
        )
    ) as targeting
from ads
    join ad_targets on ads.id = ad_targets.ad_id
    join targets on ad_targets.target_id = targets.id
group by ads.id limit 10
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m using SQLite’s JSON functions to dynamically assemble the JSON format that datasette-json-html knows how to render. I’m delighted at how well it works.&lt;/p&gt;
&lt;p&gt;I’ve turned off arbitrary SQL querying against the main Facebook ads Datasette instance, but there’s a copy running at &lt;a href="https://russian-ira-facebook-ads-sql-allowed.now.sh/russian-ads"&gt;https://russian-ira-facebook-ads-sql-allowed.now.sh/russian-ads&lt;/a&gt; if you want to play with these queries.&lt;/p&gt;
&lt;h4&gt;&lt;a id="Weird_implementation_details_106"&gt;&lt;/a&gt;Weird implementation details&lt;/h4&gt;
&lt;p&gt;The full source code for my implementation &lt;a href="https://github.com/simonw/russian-ira-facebook-ads-datasette"&gt;is available on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I ended up using &lt;a href="https://github.com/simonw/datasette/commit/5116c4ec8aed5091e1f75415424b80f613518dc6"&gt;an experimental plugin hook&lt;/a&gt; to enable additional custom filtering on Datasette views in order to support showing ads against multiple m2m targets, but hopefully that will be made unnecessary as work on Datasette’s &lt;a href="https://github.com/simonw/datasette/issues/354"&gt;support for m2m relationships&lt;/a&gt; progresses.&lt;/p&gt;
&lt;p&gt;I also experimented with YAML to generate the &lt;code&gt;metadata.json&lt;/code&gt; file as JSON strings aren’t a great way of &lt;a href="https://github.com/simonw/russian-ira-facebook-ads-datasette/blob/336ba87ef8071e664441ad0a95e3b8d0a33f682a/russian-ads-metadata.yaml"&gt;representing multi-line HTML and SQL&lt;/a&gt;. And if you want to see some &lt;em&gt;really&lt;/em&gt; convoluted SQL have a look at how the &lt;a href="https://github.com/simonw/russian-ira-facebook-ads-datasette/blob/336ba87ef8071e664441ad0a95e3b8d0a33f682a/russian-ads-metadata.yaml#L52-L81"&gt;canned query&lt;/a&gt; for the &lt;a href="https://russian-ira-facebook-ads.datasettes.com/russian-ads-919cbfd/faceted-targets?targets=%5B%22371f0%22%2C%22cc5ed%22%5D"&gt;faceted targeting interface&lt;/a&gt; works.&lt;/p&gt;
&lt;p&gt;This was a really fun project, which further stretched my ideas about what Datasette should be capable of out of the box. I’m hoping that the &lt;a href="https://github.com/simonw/datasette/issues/354"&gt;m2m work&lt;/a&gt; will make a lot of these crazy hacks redundant.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/yaml"&gt;yaml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite-utils"&gt;sqlite-utils&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="politics"/><category term="projects"/><category term="yaml"/><category term="datasette"/><category term="sqlite-utils"/></entry><entry><title>Exploring the UK Register of Members Interests with SQL and Datasette</title><link href="https://simonwillison.net/2018/Apr/25/register-members-interests/#atom-tag" rel="alternate"/><published>2018-04-25T15:49:39+00:00</published><updated>2018-04-25T15:49:39+00:00</updated><id>https://simonwillison.net/2018/Apr/25/register-members-interests/#atom-tag</id><summary type="html">
    &lt;p&gt;Ever wondered which UK Members of Parliament &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+people_name%2C+person_id%2C+count%28*%29+as+n%2C+group_concat%28item%2C+%22+%3A%3A+%22%29+from+%28select+distinct+item%2C+people.name+as+people_name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash%29%0D%0Agroup+by+people_name%2C+person_id%0D%0Aorder+by+n+desc%3B&amp;amp;search=helicopter"&gt;get gifted the most helicopter rides&lt;/a&gt;? How about &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+distinct+item%2C+people.name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash&amp;amp;search=gift+sultan+brunei"&gt;which MPs have been given Christmas hampers&lt;/a&gt; by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+distinct+item%2C+people.name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash&amp;amp;search=gift+sultan+brunei" class="a-img"&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/mp-gifts-sultan.png" alt="Gifts from the Sultan" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.mysociety.org/"&gt;mySociety&lt;/a&gt; have been building incredible civic participation applications like &lt;a href="https://www.theyworkforyou.com/"&gt;TheyWorkForYou&lt;/a&gt; and &lt;a href="https://www.fixmystreet.com/"&gt;FixMyStreet&lt;/a&gt; for nearly 15 years now, and have accumulated all kinds of interesting data along the way.&lt;/p&gt;
&lt;p&gt;They recently launched their own data portal at &lt;a href="https://data.mysociety.org/"&gt;data.mysociety.org&lt;/a&gt; listing all of the information they have available. While exploring it I stumbled across their copy of the &lt;a href="https://data.mysociety.org/datasets/members-interest/"&gt;UK Register of Members Interests&lt;/a&gt;. Every UK Member of Parliament has to &lt;a href="https://www.parliament.uk/mps-lords-and-offices/standards-and-financial-interests/parliamentary-commissioner-for-standards/registers-of-interests/register-of-members-financial-interests/"&gt;register their conflicts of interest and income sources&lt;/a&gt;, and mySociety have an ongoing project to parse that data into a more useful format.&lt;/p&gt;
&lt;p&gt;It won’t surprise you to hear that I couldn’t resist turning their XML files into a SQLite database.&lt;/p&gt;
&lt;p&gt;The result is &lt;a href="https://register-of-members-interests.datasettes.com"&gt;register-of-members-interests.datasettes.com&lt;/a&gt; - a &lt;a href="https://github.com/simonw/datasette"&gt;Datasette&lt;/a&gt; instance running against a SQLite database containing over 1.3 million line-items registered by 1,419 MPs over the course of 18 years.&lt;/p&gt;
&lt;h3&gt;&lt;a id="Some_fun_queries_14"&gt;&lt;/a&gt;Some fun queries&lt;/h3&gt;
&lt;p&gt;A few of my favourites so far:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which MPs have taken &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+people_name%2C+person_id%2C+count%28*%29+as+n%2C+group_concat%28item%2C+%22+%3A%3A+%22%29+from+%28select+distinct+item%2C+people.name+as+people_name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash%29%0D%0Agroup+by+people_name%2C+person_id%0D%0Aorder+by+n+desc%3B&amp;amp;search=helicopter"&gt;the most donated helicopter rides&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Which MPs have accepted &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+distinct+item%2C+people.name%2C+person_id%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+hash&amp;amp;search=gift+sultan+brunei"&gt;gifts from the Surtan of Brunei&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;A better gifts query with more robust de-duping: here's gifts of &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+item%2C+people.name%2C+max%28date%29+as+max_date%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29%0D%0Agroup+by+item%2C+people.name%2C+person_id%0D%0Aorder+by+max_date+desc&amp;amp;search=hamper"&gt;hampers&lt;/a&gt;, &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+item%2C+people.name%2C+max%28date%29+as+max_date%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29%0D%0Agroup+by+item%2C+people.name%2C+person_id%0D%0Aorder+by+max_date+desc&amp;amp;search=gift+watch"&gt;watches&lt;/a&gt; and &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+item%2C+people.name%2C+max%28date%29+as+max_date%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Awhere+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29%0D%0Agroup+by+item%2C+people.name%2C+person_id%0D%0Aorder+by+max_date+desc&amp;amp;search=glastonbury"&gt;Glastonbury festival tickets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Which MPs own significant shares in &lt;a href="https://register-of-members-interests.datasettes.com/regmem/mps_ranked_by_matching_shareholdings?search=shell"&gt;Shell&lt;/a&gt;, &lt;a href="(https://register-of-members-interests.datasettes.com/regmem/mps_ranked_by_matching_shareholdings?search=apple)"&gt;Apple&lt;/a&gt;, or &lt;a href="https://register-of-members-interests.datasettes.com/regmem/mps_ranked_by_matching_shareholdings?search=bp"&gt;BP&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Who has reported &lt;a href="https://register-of-members-interests.datasettes.com/regmem/mps_ranked_by_matching_overseas_line_items?search=saudi+arabia"&gt;the most overseas trips to Saudi Arabia&lt;/a&gt;? The top hit there was Daniel Kawczynski, and it turns out his &lt;a href="https://en.wikipedia.org/wiki/Daniel_Kawczynski"&gt;Wikipedia page&lt;/a&gt; has a section dedicated to his relationship with the kindgom.&lt;/li&gt;
&lt;li&gt;How much do MPs get paid for &lt;a href="https://register-of-members-interests.datasettes.com/regmem/items?_search=Have+I+Got+News+for+You&amp;amp;_sort_desc=date"&gt;appearances on Have I Got News For You&lt;/a&gt;. The going rate seems to be £1,500 but you can find &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+distinct+item%2C+person_id%2C+people.name+from+items+join+people+on+people.id+%3D+items.person_id%0D%0Awhere+%22item%22+not+like+%22%251%2C500%25%22+and+%22item%22+not+like+%22%251500%22+and+%22item%22+like+%22%25%C2%A3%25%22+and+items.rowid+in+%28select+rowid+from+%5Bitems_fts%5D+where+%5Bitems_fts%5D+match+%3Asearch%29+order+by+date+desc+limit+101&amp;amp;search=Have+I+Got+News+For+You"&gt;some interesting discrepancies&lt;/a&gt; if you exclude that value from the results.&lt;/li&gt;
&lt;li&gt;Which MPs are responsible for &lt;a href="https://register-of-members-interests.datasettes.com/regmem?sql=select+%22person_id%22%2C+people.name%2C+count(*)+as+%22count%22%0D%0Afrom+items+join+people+on+items.person_id+%3D+people.id%0D%0Agroup+by+%22person_id%22+order+by+%22count%22+desc+limit+100"&gt;the most total line items reported&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;a id="Understanding_the_data_model_23"&gt;&lt;/a&gt;Understanding the data model&lt;/h3&gt;
&lt;p&gt;Most of the action takes place in the &lt;a href="https://register-of-members-interests.datasettes.com/regmem/items"&gt;items&lt;/a&gt; table, where each item is a line-item from an MP’s filing. You can search that table by keyword (see helicopter example above) or apply filters to it using the standard Datasette interface. You can also execute your own SQL directly against the database.&lt;/p&gt;
&lt;p&gt;Each item is filed against a &lt;a href="https://register-of-members-interests.datasettes.com/regmem/categories?_sort=type"&gt;category&lt;/a&gt;. There appears to have been quite a bit of churn in the way that the categories are defined over the years, plus the data is pretty untidy - there are no less than 10 ways of spelling “Remunerated employment, office, profession etc.” for example!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://register-of-members-interests.datasettes.com/regmem/categories" class="a-img"&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/mp-categories.png" alt="Categories" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are also a LOT of duplicate items in the set - it appears that MPs frequently list the same item (a rental property for example) every time they fill out the register. SQL DISTINCT clauses can help filter through these, as seen in some of the above examples.&lt;/p&gt;
&lt;p&gt;The data also has the concepts of both &lt;a href="https://register-of-members-interests.datasettes.com/regmem/members"&gt;members&lt;/a&gt; and &lt;a href="https://register-of-members-interests.datasettes.com/regmem/people"&gt;people&lt;/a&gt;. As far as I can tell people are distinct, but members may contain duplicates - presumably to represent MPs who have served more than one term in office. It looks like the member field &lt;a href="https://register-of-members-interests.datasettes.com/regmem/items?_sort_desc=date&amp;amp;member_id__notblank=1"&gt;stopped being populated in March 2015&lt;/a&gt; so analysis is best performed against the people table.&lt;/p&gt;
&lt;p&gt;Once concept I have introduced myself is the &lt;code&gt;record_id&lt;/code&gt;. In the XML documents the items are often grouped together into a related collection, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;regmem personid=&amp;quot;uk.org.publicwhip/person/10001&amp;quot;
    memberid=&amp;quot;uk.org.publicwhip/member/40289&amp;quot; membername=&amp;quot;Diane Abbott&amp;quot; date=&amp;quot;2014-07-14&amp;quot;&amp;gt;
    &amp;lt;category type=&amp;quot;2&amp;quot; name=&amp;quot;Remunerated employment, office, profession etc&amp;quot;&amp;gt;
        &amp;lt;item&amp;gt;Payments from MRL Public Sector Consultants, Pepple House, 8 Broad Street, Great Cambourne, Cambridge CB23 6HJ:&amp;lt;/item&amp;gt;
        &amp;lt;item&amp;gt;26 November 2013, I received a fee of £1,000 for speaking at the 1st African Legislative Summit, National Assembly, Abuja, Nigeria.  Hours: 8 hrs. The cost of my flights, transfers and hotel accommodation in Abuja were also met; estimated value £5,000. &amp;lt;em&amp;gt;(Registered 3 December 2013)&amp;lt;/em&amp;gt;&amp;lt;/item&amp;gt;
        &amp;lt;item&amp;gt;23 July 2013, I received a fee of £5,000 for appearing as a contestant on ITV&amp;amp;#8217;s &amp;amp;#8216;The Chase Celebrity &amp;amp;#8211; Series 3&amp;amp;#8217; television programme.  Address of payer:  ITV Studios Ltd, London Television Centre, Upper Ground, London SE1 9Lt.  Hours: 12 hrs.   &amp;lt;em&amp;gt;(Registered 23 July 2013)&amp;lt;/em&amp;gt;&amp;lt;/item&amp;gt;
    &amp;lt;/category&amp;gt;
&amp;lt;/regmem&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While these items are presented as separate line items, their grouping carries meaning: the first line item here acts as a kind of heading to help provide context to the other items.&lt;/p&gt;
&lt;p&gt;To model this in the simplest way possible, I’ve attempted to preserve the order of these groups using a pair of additional columns: the &lt;code&gt;record_id&lt;/code&gt; and the &lt;code&gt;sort_order&lt;/code&gt;. I construct the &lt;code&gt;record_id&lt;/code&gt; using a collection of other fields - the idea is for it to be sortable, and for each line-item in the same grouping to have the same &lt;code&gt;record_id&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;record_id = &amp;quot;{date}-{category_id}-{person_id}-{record}&amp;quot;.format(
    date=date,
    category_id=category_id,
    person_id=person_id.split(&amp;quot;/&amp;quot;)[
        -1
    ],
    record=record,
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The resulting &lt;code&gt;record_id&lt;/code&gt; might look like this: &lt;code&gt;2018-04-16-70b64e89-24878-0&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;To recreate that particular sequence of line-items, you can search for all items matching that &lt;code&gt;record_id&lt;/code&gt; and then sort them by their &lt;code&gt;sort_order&lt;/code&gt;. Here’s &lt;a href="https://register-of-members-interests.datasettes.com/regmem/items?_sort=sort_order&amp;amp;record_id__exact=2014-07-14-1f359ece-10001-0"&gt;that record from Diane Abbott&lt;/a&gt; shown with its surrounding context.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://register-of-members-interests.datasettes.com/regmem/items?_sort=sort_order&amp;amp;record_id__exact=2014-07-14-1f359ece-10001-0" class="a-img"&gt;&lt;img style="max-width: 100%" src="https://static.simonwillison.net/static/2018/mp-record.png" alt="A single record" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;&lt;a id="How_I_built_it_65"&gt;&lt;/a&gt;How I built it&lt;/h3&gt;
&lt;p&gt;The short version: I downloaded all of the XML files and wrote a Python script which parsed them using &lt;a href="https://docs.python.org/3/library/xml.etree.elementtree.html"&gt;ElementTree&lt;/a&gt; and inserted them into a SQLite database. I’ve &lt;a href="https://github.com/simonw/register-of-members-interests"&gt;put the code on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A couple of fun tricks: firstly, I borrowed some code from &lt;a href="https://github.com/simonw/csvs-to-sqlite"&gt;csvs-to-sqlite&lt;/a&gt; to create the full-text search index and enable searching:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def create_and_populate_fts(conn):
    create_sql = &amp;quot;&amp;quot;&amp;quot;
        CREATE VIRTUAL TABLE &amp;quot;items_fts&amp;quot;
        USING {fts_version} (item, person_name, content=&amp;quot;items&amp;quot;)
    &amp;quot;&amp;quot;&amp;quot;.format(
        fts_version=best_fts_version()
    )
    conn.executescript(create_sql)
    conn.executescript(
        &amp;quot;&amp;quot;&amp;quot;
        INSERT INTO &amp;quot;items_fts&amp;quot; (rowid, item, person_name)
        SELECT items.rowid, items.item, people.name
        FROM items LEFT JOIN people ON items.person_id = people.id
    &amp;quot;&amp;quot;&amp;quot;
    )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;best_fts_version()&lt;/code&gt; function &lt;a href="https://github.com/simonw/register-of-members-interests/blob/6372cfbc0905c4c045714e9092a981668f2d6345/convert_xml_to_sqlite.py#L56-L71"&gt;implements basic feature detection&lt;/a&gt; against SQLite by trying operations in an in-memory database.&lt;/p&gt;
&lt;p&gt;Secondly, I ended up writing my own tiny utility function for inserting records into SQLite. SQLite has useful &lt;code&gt;INSERT OR REPLACE INTO&lt;/code&gt; syntax which allows you to insert a record and will automatically update an existing record if there is a match on the primary key. This meant I could write this utility function and use it for all of my data inserts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def insert_or_replace(conn, table, record):
    pairs = record.items()
    columns = [p[0] for p in pairs]
    params = [p[1] for p in pairs]
    sql = &amp;quot;INSERT OR REPLACE INTO {table} ({column_list}) VALUES ({value_list});&amp;quot;.format(
        table=table,
        column_list=&amp;quot;, &amp;quot;.join(columns),
        value_list=&amp;quot;, &amp;quot;.join([&amp;quot;?&amp;quot; for p in params]),
    )
    conn.execute(sql, params)

# ...

insert_or_replace(
    db,
    &amp;quot;people&amp;quot;,
    {
        &amp;quot;id&amp;quot;: person_id,
        &amp;quot;name&amp;quot;: regmem_el.attrib[&amp;quot;membername&amp;quot;],
    },
)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;a id="What_can_you_find_113"&gt;&lt;/a&gt;What can you find?&lt;/h3&gt;
&lt;p&gt;I’ve really only scratched the surface of what’s in here with my initial queries. What can you find? Send me Datasette query links &lt;a href="https://twitter.com/simonw"&gt;on Twitter&lt;/a&gt; with your discoveries!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mysociety"&gt;mysociety&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/political-hacking"&gt;political-hacking&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="mysociety"/><category term="political-hacking"/><category term="politics"/><category term="projects"/><category term="sqlite"/><category term="xml"/><category term="datasette"/></entry><entry><title>Quoting Baroness Finlay of Llandaff</title><link href="https://simonwillison.net/2010/Mar/31/lords/#atom-tag" rel="alternate"/><published>2010-03-31T16:06:26+00:00</published><updated>2010-03-31T16:06:26+00:00</updated><id>https://simonwillison.net/2010/Mar/31/lords/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://www.publications.parliament.uk/pa/ld200910/ldhansrd/text/100303-0001.htm"&gt;&lt;p&gt;Miss Wilson, when she was a resident superintendent in this Palace, had a cat that apparently caught up to 60 mice a night. The corpses were then swept up in the morning. Finally, does the noble Lord recognise the fire hazard that mice pose, because they eat through insulating cables? It would be a tragedy for this beautiful Palace to burn down for lack of a cat.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://www.publications.parliament.uk/pa/ld200910/ldhansrd/text/100303-0001.htm"&gt;Baroness Finlay of Llandaff&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/cats"&gt;cats&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hansard"&gt;hansard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/houseoflords"&gt;houseoflords&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;&lt;/p&gt;



</summary><category term="cats"/><category term="hansard"/><category term="houseoflords"/><category term="politics"/></entry><entry><title>They Write For You</title><link href="https://simonwillison.net/2010/Feb/2/theywrite/#atom-tag" rel="alternate"/><published>2010-02-02T09:27:40+00:00</published><updated>2010-02-02T09:27:40+00:00</updated><id>https://simonwillison.net/2010/Feb/2/theywrite/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://dharmafly.com/theywriteforyou/"&gt;They Write For You&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I helped put together this visualisation of stories written by MPs for various newspapers at last Friday’s ’Hackers and Hacks" hack day.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hackday"&gt;hackday&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mps"&gt;mps&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/newspapers"&gt;newspapers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/visualisations"&gt;visualisations&lt;/a&gt;&lt;/p&gt;



</summary><category term="hackday"/><category term="mps"/><category term="newspapers"/><category term="politics"/><category term="projects"/><category term="visualisations"/></entry><entry><title>Crowdsourced document analysis and MP expenses</title><link href="https://simonwillison.net/2009/Dec/20/crowdsourcing/#atom-tag" rel="alternate"/><published>2009-12-20T12:07:53+00:00</published><updated>2009-12-20T12:07:53+00:00</updated><id>https://simonwillison.net/2009/Dec/20/crowdsourcing/#atom-tag</id><summary type="html">
    &lt;p&gt;As &lt;a href="https://web.archive.org/web/20091204154825/https://www.guardian.co.uk/politics/mps-expenses"&gt;you may have heard&lt;/a&gt;, the UK government released a fresh batch of MP expenses documents a week ago on Thursday. I spent that week working with a small team at Guardian HQ to prepare for the release. Here's what we built:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://web.archive.org/web/20091213164102/http://mps-expenses2.guardian.co.uk/"&gt;http://mps-expenses2.guardian.co.uk/&lt;/a&gt; &lt;em&gt;Updated March 2021: all links now go to the Internet Archive&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2009/mp-expenses-2-cropped.png" alt="Screenshot of the homepage from December 2019" style="max-width: 100%" /&gt;&lt;/p&gt;

&lt;p&gt;It's a crowdsourcing application that asks the public to help us dig through and categorise the enormous stack of documents - around 30,000 pages of claim forms, scanned receipts and hand-written letters, all scanned and published as PDFs.&lt;/p&gt;

&lt;p&gt;This is the second time we've tried this - the first was back in June, and can be seen at &lt;a href="https://web.archive.org/web/20090802094829/http://mps-expenses.guardian.co.uk/"&gt;mps-expenses.guardian.co.uk&lt;/a&gt;. Last week's attempt was an opportunity to apply the lessons we learnt the first time round.&lt;/p&gt;

&lt;p&gt;Writing crowdsourcing applications in a newspaper environment is a fascinating challenge. Projects have very little notice - I heard about the new document release the Thursday before giving less than a week to put everything together. In addition to the fast turnaround for the application itself, the 48 hours following the release are crucial. The news cycle moves fast, so if the application launches but we don't manage to get useful data out of it quickly the story will move on before we can impact it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://web.archive.org/web/20091124150940/http://www.scalecamp.org.uk/"&gt;ScaleCamp&lt;/a&gt; on the Friday meant that development work didn't properly kick off until Monday morning. The bulk of the work was performed by two server-side developers, one client-side developer, one designer and one QA on Monday, Tuesday and Wednesday. The Guardian operations team deftly handled our EC2 configuration and deployment, and we had some extra help on the day from other members of the technology department. After launch we also had a number of journalists helping highlight discoveries and dig through submissions.&lt;/p&gt;

&lt;p&gt;The system was written using Django, MySQL (InnoDB), Redis and memcached.&lt;/p&gt;

&lt;h4 id="asking-the-right-question"&gt;Asking the right question&lt;/h4&gt;

&lt;p&gt;The biggest mistake we made the first time round was that we asked the wrong question. We tried to get our audience to categorise documents as either "claims" or "receipts" and to rank them as "not interesting", "a bit interesting", "interesting but already known" and "someone should investigate this". We also asked users to optionally enter any numbers they saw on the page as categorised "line items", with the intention of adding these up later.&lt;/p&gt;

&lt;p&gt;The line items, with hindsight, were a mistake. 400,000 documents makes for a huge amount of data entry and for the figures to be useful we would need to confirm their accuracy. This would mean yet more rounds of crowdsourcing, and the job was so large that the chance of getting even one person to enter line items for each page rapidly diminished as the news story grew less prominent.&lt;/p&gt;

&lt;p&gt;The categorisations worked reasonably well but weren't particularly interesting - knowing if a document is a claim or receipt is useful only if you're going to collect line items. The "investigate this" button worked very well though.&lt;/p&gt;

&lt;p&gt;We completely changed our approach for the new system. We dropped the line item task and instead asked our users to categories each page by applying one or more tags, from a small set that our editors could control. This gave us a lot more flexibility - we changed the tags shortly before launch based on the characteristics of the documents - and had the potential to be a lot more fun as well. I'm particularly fond of the "hand-written" tag, which has highlighted some &lt;a href="https://web.archive.org/web/20091223091650/http://mps-expenses2.guardian.co.uk/page/1062/"&gt;lovely examples&lt;/a&gt; of correspondence between MPs and the expenses office.&lt;/p&gt;

&lt;p&gt;Sticking to an editorially assigned set of tags provided a powerful tool for directing people's investigations, and also ensured our users didn't start creating potentially libelous tags of their own.&lt;/p&gt;

&lt;h4 id="breaking-up-assignments"&gt;Breaking it up in to assignments&lt;/h4&gt;

&lt;p&gt;For the first project, everyone worked together on the same task to review all of the documents. This worked fine while the document set was small, but once we had loaded in 400,000+ pages the progress bar become quite depressing.&lt;/p&gt;

&lt;p&gt;This time round, we added a new concept of "&lt;a href="https://web.archive.org/web/20091215224727/http://mps-expenses2.guardian.co.uk/assignment/"&gt;assignments&lt;/a&gt;". Each assignment consisted of the set of pages belonging to a specified list of MPs, documents or political parties. Assignments had a threshold, so we could specify that a page must be reviewed by at least X people before it was considered reviewed. An editorial tool let us feature one "main" assignment and several alternative assignments right on the homepage.&lt;/p&gt;

&lt;p&gt;Clicking "start reviewing" on an assignment sets a cookie for that assignment, and adds the assignment's progress bar to the top of the review interface. New pages are selected at random from the set of unreviewed pages in that assignment.&lt;/p&gt;

&lt;p&gt;The assignments system proved extremely effective. We could use it to direct people to the highest value documents (our top hit list of interesting MPs, or members of the shadow cabinet) while still allowing people with specific interests to pick an alternative task.&lt;/p&gt;

&lt;h4 id="get-the-button-right"&gt;Get the button right!&lt;/h4&gt;

&lt;p&gt;Having run two crowdsourcing projects I can tell you this: the single most important piece of code you will write is the code that gives someone something new to review. Both of our projects had big "start reviewing" buttons. Both were broken in different ways.&lt;/p&gt;

&lt;p&gt;The first time round, the mistakes were around scalability. I used a SQL "ORDER BY RAND()" statement to return the next page to review. I knew this was an inefficient operation, but I assumed that it wouldn't matter since the button would only be clicked occasionally.&lt;/p&gt;

&lt;p&gt;Something like 90% of our database load turned out to be caused by that one SQL statement, and it only got worse as we loaded more pages in to the system. This caused multiple site slow downs and crashes until we threw together a cron job that pushed 1,000 unreviewed page IDs in to memcached and made the button pick one of those at random.&lt;/p&gt;

&lt;p&gt;This solved the performance problem, but meant that our user activity wasn't nearly as well targeted. For optimum efficiency you really want everyone to be looking at a different page - and a random distribution is almost certainly the easiest way to achieve that.&lt;/p&gt;

&lt;p&gt;The second time round I turned to my new favourite in-memory data structure server, &lt;a href="http://code.google.com/p/redis/"&gt;redis&lt;/a&gt;, and its &lt;a href="http://code.google.com/p/redis/wiki/SrandmemberCommand"&gt;SRANDMEMBER&lt;/a&gt; command (a feature I &lt;a href="http://twitter.com/simonw/status/5027987857"&gt;requested&lt;/a&gt; a while ago with this exact kind of project in mind). The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the &lt;a href="http://code.google.com/p/redis/wiki/SdiffstoreCommand"&gt;SDIFFSTORE&lt;/a&gt; command) to create a set of unreviewed pages for the current assignment and then SRANDMEMBER to pick one of those pages.&lt;/p&gt;

&lt;p&gt;This is where the bug crept in. Redis was just being used as an optimisation - the single point of truth for whether a page had been reviewed or not stayed as MySQL. I wrote a couple of Django management commands to repopulate the denormalised Redis sets should we need to manually modify the database. Unfortunately I missed some - the sets that tracked what pages were available in each document. The assignment generation code used an intersection of these sets to create the overall set of documents for that assignment. When we deleted some pages that had accidentally been imported twice I failed to update those sets.&lt;/p&gt;

&lt;p&gt;This meant the "next page" button would occasionally turn up a page that didn't exist. I had some very poorly considered fallback logic for that - if the random page didn't exist, the system would return the first page in that assignment instead. Unfortunately, this meant that when the assignment was down to the last four non-existent pages every single user was directed to the same page - which subsequently attracted well over a thousand individual reviews.&lt;/p&gt;

&lt;p&gt;Next time, I'm going to try and make the "next" button completely bullet proof! I'm also going to maintain a "denormalisation dictionary" documenting every denormalisation in the system in detail - such a thing would have saved me several hours of confused debugging.&lt;/p&gt;

&lt;h4 id="exposing-the-results"&gt;Exposing the results&lt;/h4&gt;

&lt;p&gt;The biggest mistake I made last time was not getting the data back out again fast enough for our reporters to effectively use it. It took 24 hours from the launch of the application to the moment the first reporting feature was added - mainly because we spent much of the intervening time figuring out the scaling issues.&lt;/p&gt;

&lt;p&gt;This time we handled this a lot better. We provided private pages exposing all recent activity on the site. We also provided public pages for each of the tags, as well as combination pages for party + tag, MP + tag, document + tag, assignment + tag and user + tag. Most of these pages were ordered by most-tagged, with the hope that the most interesting pages would quickly bubble to the top.&lt;/p&gt;

&lt;p&gt;This worked pretty well, but we made one key mistake. The way we were ordering pages meant that it was almost impossible to paginate through them and be sure that you had seen everything under a specific tag. If you're trying to keep track of everything going on in the site, reliable pagination is essential. The only way to get reliable pagination on a fast moving site is to order by the date something was first added to a set in ascending order. That way you can work through all of the pages, wait a bit, hit "refresh" and be able to continue paginating where you left off. Any other order results in the content of each page changing as new content comes in.&lt;/p&gt;

&lt;p&gt;We eventually added an undocumented /in-order/ URL prefix to address this issue. Next time I'll pay a lot more attention to getting the pagination options right from the start.&lt;/p&gt;

&lt;h4 id="rewarding-our-contributors"&gt;Rewarding our contributors&lt;/h4&gt;

&lt;p&gt;The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn't want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time.&lt;/p&gt;

&lt;p&gt;For the new version, we tried to provide a much better feeling of activity around the site. We added "top reviewer" tables to every assignment, MP and political party as well as a "most active reviewers in the past 48 hours" table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.&lt;/p&gt;

&lt;p&gt;Most importantly, we added a concept of &lt;a href="https://web.archive.org/web/20091223091046/http://mps-expenses2.guardian.co.uk/discoveries/"&gt;discoveries&lt;/a&gt; - editorially highlighted pages that were shown on the homepage and credited to the user that had first highlighted them. These discoveries also added valuable editorial interest to the site, showing up on the homepage and also the index pages for &lt;a href="https://web.archive.org/web/20091215191906/http://mps-expenses2.guardian.co.uk/labour/"&gt;political parties&lt;/a&gt; and &lt;a href="https://web.archive.org/web/20091215050919/http://mps-expenses2.guardian.co.uk/conservative/gerald-howarth/"&gt;individual MPs&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id="light-weight-registration"&gt;Light-weight registration&lt;/h4&gt;

&lt;p&gt;For both projects, we implemented an extremely light-weight form of registration. Users can start reviewing pages without going through any signup mechanism, and instead are assigned a cookie and an anon-454 style username the first time they review a document. They are then encouraged to assign themselves a proper username and password so they can log in later and take credit for their discoveries.&lt;/p&gt;

&lt;p&gt;It's difficult to tell how effective this approach really is. I have a strong hunch that it dramatically increases the number of people who review at least one document, but without a formal A/B test it's hard to tell how true that is. The UI for this process in the first project was quite confusing - we gave it a solid makeover the second time round, which seems to have resulted in a higher number of conversions.&lt;/p&gt;

&lt;h4 id="overall-lessons"&gt;Overall lessons&lt;/h4&gt;

&lt;p&gt;News-based crowdsourcing projects of this nature are both challenging and an enormous amount of fun. For the best chances of success, be sure to ask the right question, ensure user contributions are rewarded, expose as much data as possible and make the "next thing to review" behaviour rock solid. I'm looking forward to the next opportunity to apply these lessons, although at this point I &lt;em&gt;really&lt;/em&gt; hope it involves something other than MPs' expenses.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/crowdsourcing"&gt;crowdsourcing&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/guardian"&gt;guardian&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/innodb"&gt;innodb&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/memcached"&gt;memcached&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mpsexpenses"&gt;mpsexpenses&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mysql"&gt;mysql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nosql"&gt;nosql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/redis"&gt;redis&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="crowdsourcing"/><category term="django"/><category term="guardian"/><category term="innodb"/><category term="memcached"/><category term="mpsexpenses"/><category term="mysql"/><category term="nosql"/><category term="politics"/><category term="projects"/><category term="python"/><category term="redis"/></entry><entry><title>HTTP + Politics = ?</title><link href="https://simonwillison.net/2009/Dec/15/australia/#atom-tag" rel="alternate"/><published>2009-12-15T15:36:20+00:00</published><updated>2009-12-15T15:36:20+00:00</updated><id>https://simonwillison.net/2009/Dec/15/australia/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.mnot.net/blog/2009/12/16/http_au"&gt;HTTP + Politics = ?&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Mark Nottingham ponders the technical implications of Australia’s decision to apply a filter to all internet traffic. Australia is large enough (and far enough away from the northern hemisphere) that the speed of light is a performance issue, but filtering technologies play extremely poorly with optimisation technologies such as HTTP pipelining and Google’s SPDY proposal.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/australia"&gt;australia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/filtering"&gt;filtering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mark-nottingham"&gt;mark-nottingham&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/performance"&gt;performance&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pipelining"&gt;pipelining&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/spdy"&gt;spdy&lt;/a&gt;&lt;/p&gt;



</summary><category term="australia"/><category term="filtering"/><category term="google"/><category term="http"/><category term="mark-nottingham"/><category term="performance"/><category term="pipelining"/><category term="politics"/><category term="spdy"/></entry><entry><title>MoD sticks with insecure browser</title><link href="https://simonwillison.net/2009/Jul/24/mod/#atom-tag" rel="alternate"/><published>2009-07-24T10:18:55+00:00</published><updated>2009-07-24T10:18:55+00:00</updated><id>https://simonwillison.net/2009/Jul/24/mod/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.kable.co.uk/government-internet-browsers-23jul09"&gt;MoD sticks with insecure browser&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tom Watson MP used parliamentary written answers to find out that the majority of government departments still require their staff to use IE6, and not all of them have upgrade plans to 7 or 8. Not a single department considered an alternative browser. “Many civil servants use web browsers as a tool of their trade. They’re as important as pens and paper. So to force them to use the most decrepit browser in the world is a rare form of workplace cruelty that should be stopped.”


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/civilservice"&gt;civilservice&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tom-watson"&gt;tom-watson&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ukgovernment"&gt;ukgovernment&lt;/a&gt;&lt;/p&gt;



</summary><category term="browsers"/><category term="civilservice"/><category term="politics"/><category term="tom-watson"/><category term="ukgovernment"/></entry><entry><title>Early Day Motion to support Bletchley Park Museum</title><link href="https://simonwillison.net/2009/Jul/21/bletchley/#atom-tag" rel="alternate"/><published>2009-07-21T13:56:35+00:00</published><updated>2009-07-21T13:56:35+00:00</updated><id>https://simonwillison.net/2009/Jul/21/bletchley/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.edparsons.com/2009/07/early-day-motion-to-support-bletchley-park-museum/"&gt;Early Day Motion to support Bletchley Park Museum&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Time to fire up WriteToThem.com and drop your MP a friendly note of encouragement.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bletchleypark"&gt;bletchleypark&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/edparsons"&gt;edparsons&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/politics"&gt;politics&lt;/a&gt;&lt;/p&gt;



</summary><category term="bletchleypark"/><category term="edparsons"/><category term="politics"/></entry></feed>