I expect GPT-4 will have a LOT of applications in web scraping
The increased 32,000 token limit will be large enough to send it the full DOM of most pages, serialized to HTML—then ask questions to extract data
Or... take a screenshot and use the GPT4 image input mode to ask questions about the visually rendered page instead!
Might need to dust off all of those old semantic web dreams, because the world’s information is rapidly becoming fully machine readable
— Me
Recent articles
- I built a ChatGPT plugin to answer questions about data hosted in Datasette - 24th March 2023
- Weeknotes: AI won't slow down, a new newsletter and a huge Datasette refactor - 22nd March 2023
- Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail - 22nd March 2023
- A conversation about prompt engineering with CBC Day 6 - 18th March 2023
- Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - 17th March 2023
- Stanford Alpaca, and the acceleration of on-device large language model development - 13th March 2023