“That’s something that, you know, we can’t really comment on at this time,” said OpenAI’s chief scientist, Ilya Sutskever, when I spoke to the GPT-4 team in a video call an hour after the announcement. “It’s pretty competitive out there.”
Access to GPT-4 will be available to users who sign up to the waitlist and for subscribers of the premium paid-for ChatGPT Plus in a limited, text-only capacity.
GPT-4 is a multimodal large language model, which means it can respond to both text and images. Give it a photo of the contents of your fridge and ask it what you could make, and GPT-4 will try to come up with recipes that use the pictured ingredients.
“The continued improvements along many dimensions are remarkable,” says Oren Etzioni at the Allen Institute for AI. “GPT-4 is now the standard by which all foundation models will be evaluated.”
“A good multimodal model has been the holy grail of many big tech labs for the past couple of years,” says Thomas Wolf, cofounder of Hugging Face, the AI startup behind the open-source large language model BLOOM. “But it has remained elusive.”
In theory, combining text and images could allow multimodal models to understand the world better. “It might be able to tackle traditional weak points of language models, like spatial reasoning,” says Wolf.
It is not yet clear if that’s true for GPT-4. OpenAI’s new model appears to be better at some basic reasoning than ChatGPT, solving simple puzzles such as summarizing blocks of text in words that start with the same letter. In my demo, I was shown GPT-4 summarizing the announcement blurb from OpenAI’s website using words that begin with g: “GPT-4, groundbreaking generational growth, gains greater grades. Guardrails, guidance, and gains garnered. Gigantic, groundbreaking, and globally gifted.” In another demo, GPT-4 took in a document about taxes and answered questions about it, citing reasons for its responses.