Items tagged scraping in Mar, 2023
Filters: Year: 2023 × Month: Mar × scraping × Sorted by date
scrapeghost (via) Scraping is a really interesting application for large language model tools like GPT3. James Turk’s scrapeghost is a very neatly designed entrant into this space—it’s a Python library and CLI tool that can be pointed at any URL and given a roughly defined schema (using a neat mini schema language) which will then use GPT3 to scrape the page and try to return the results in the supplied format. # 26th March 2023, 5:29 am
I expect GPT-4 will have a LOT of applications in web scraping
The increased 32,000 token limit will be large enough to send it the full DOM of most pages, serialized to HTML—then ask questions to extract data
Or... take a screenshot and use the GPT4 image input mode to ask questions about the visually rendered page instead!
Might need to dust off all of those old semantic web dreams, because the world’s information is rapidly becoming fully machine readable
— Me # 16th March 2023, 1:09 am