Simon Willison’s Weblog

Subscribe

8 items tagged “semanticweb”

2023

I expect GPT-4 will have a LOT of applications in web scraping

The increased 32,000 token limit will be large enough to send it the full DOM of most pages, serialized to HTML—then ask questions to extract data

Or... take a screenshot and use the GPT4 image input mode to ask questions about the visually rendered page instead!

Might need to dust off all of those old semantic web dreams, because the world’s information is rapidly becoming fully machine readable

Me # 16th March 2023, 1:09 am

2010

Linked Data at the Guardian. The Guardian’s Open Platform API can now be queried by MusicBrainz ID and ISBN, opening up some extremely useful new types of query. # 19th October 2010, 7:11 pm

2009

4store Amazon Machine Image. Instructions for firing up an EC2 AMI running the recently released 4store high performance triple store and loading in 1.14 billion statements collected by crawling the semantic web. # 1st November 2009, 12:12 pm

2008

Learning to Fear the Semantic Web. Paul Ford raises the liability issue with regards to building sites around other people’s metadata, pointing out that OpenCalais is owned by Thomson Reuters who have a bad track record with regards to intellectual property lawsuits elsewhere in the organisation. # 23rd October 2008, 4:14 pm

The only down side is everyone I’ve talked to at Freebase seems pretty solid on this being their proprietary secret sauce, because a good, fast scalable open source tuple store might actually jump start a real semantic (small-S) web after all these years.

Kellan Elliott-McCrea # 29th September 2008, 3:29 pm

2007

Giant Global Graph. Tim Berners-Lee points out that the Semantic Web is designed to solve problems such as portable social networks. # 22nd November 2007, 12:30 am

dbpedia.org. They scrape Wikipedia and extract useful information from it so you don’t have to. # 7th August 2007, 3:24 pm

Triplr. Ultra simple GET-based web service for converting RSS / Atom / RDF / Microformats+GRDDL to HTML / ntriples / RDF / RSS / JSON / Turtle. Small pieces, loosely joined. # 30th March 2007, 3:30 pm