Simon Willison’s Weblog

8 items tagged “gpt4”


scrapeghost (via) Scraping is a really interesting application for large language model tools like GPT3. James Turk’s scrapeghost is a very neatly designed entrant into this space—it’s a Python library and CLI tool that can be pointed at any URL and given a roughly defined schema (using a neat mini schema language) which will then use GPT3 to scrape the page and try to return the results in the supplied format. # 26th March 2023, 5:29 am

GPT-4, like GPT-3 before it, has a capability overhang; at the time of release, neither OpenAI or its various deployment partners have a clue as to the true extent of GPT-4’s capability surface—that’s something that we’ll get to collectively discover in the coming years. This also means we don’t know the full extent of plausible misuses or harms.

Jack Clark # 22nd March 2023, 12:40 am

As an NLP researcher I’m kind of worried about this field after 10-20 years. Feels like these oversized LLMs are going to eat up this field and I’m sitting in my chair thinking, “What’s the point of my research when GPT-4 can do it better?”

Jeonghwan Kim # 16th March 2023, 5:39 am

I expect GPT-4 will have a LOT of applications in web scraping The increased 32,000 token limit will be large enough to send it the full DOM of most pages, serialized to HTML—then ask questions to extract data Or... take a screenshot and use the GPT4 image input mode to ask questions about the visually rendered page instead! Might need to dust off all of those old semantic web dreams, because the world’s information is rapidly becoming fully machine readable

Me # 16th March 2023, 1:09 am

GPT-4 Developer Livestream. 25 minutes of live demos from OpenAI co-founder Greg Brockman at the GPT-4 launch. These demos are all fascinating, including code writing and multimodal vision inputs. The one that really struck me is when Greg pasted in a copy of the tax code and asked GPT-4 to answer some sophisticated tax questions, involving step-by-step calculations that cited parts of the tax code it was working with. # 15th March 2023, 12:20 am

GPT-4 Technical Report (PDF). 98 pages of much more detailed information about GPT-4. The appendices are particularly interesting, including examples of advanced prompt engineering as well as examples of harmful outputs before and after tuning attempts to try and suppress them. # 14th March 2023, 9:39 pm

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. [...] We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails.

OpenAI # 14th March 2023, 5:02 pm

ChatGPT can’t access the internet, even though it really looks like it can

A really common misconception about ChatGPT is that it can access URLs. I’ve seen many different examples of people pasting in a URL and asking for a summary, or asking it to make use of the content on that page in some way.

[... 1564 words]