14 may 2024

Back when OpenAI announced “multimodal” ChatGPT I felt that their language was deliberately vague enough for it to be several layers functioning separately — e.g., a discrete image recognizer telling the LLM what’s in a picture.

They’ve finally confirmed that was the case, because *now* GPT-4o is a single, actually omnimodal neural network. And I find the idea that this works, and works so well and so fast, really impressive and terrifying (all over again).

Major ChatGPT-4o update allows audio-video talks with an “emotional” AI chatbot

New GPT-4o model can sing a bedtime story, detect facial expressions, read emotions.

