Predictions: Potential Capabilities of GPT-4
If GPT-4 is multimodal, then we can predict with reasonable confidence what GPT-4 might be capable of, given Microsoftβs prior work Kosmos-1:
β’ Visual IQ test: yes, the ones that humans take!
β’ OCR-free reading comprehension: input a screenshot, scanned document, street sign, or any pixels that contain text. Reason about the contents directly without explicit OCR. This is extremely useful to unlock AI-powered apps on multimedia web pages, or βtext in the wildβ from real world cams.
β’ Multimodal chat: have a conversation about a picture. You can even provide βfollow-upβ images in the middle.
β’ Broad visual understanding abilities, like captioning, visual question answering, object detection, scene layout, common sense reasoning, etc.
β’ Audio & speech recognition: wasnβt mentioned in Kosmos-1 paper, but Whisper is already an OpenAI API and should be fairly easy to integrate.
If GPT-4 is multimodal, then we can predict with reasonable confidence what GPT-4 might be capable of, given Microsoftβs prior work Kosmos-1:
β’ Visual IQ test: yes, the ones that humans take!
β’ OCR-free reading comprehension: input a screenshot, scanned document, street sign, or any pixels that contain text. Reason about the contents directly without explicit OCR. This is extremely useful to unlock AI-powered apps on multimedia web pages, or βtext in the wildβ from real world cams.
β’ Multimodal chat: have a conversation about a picture. You can even provide βfollow-upβ images in the middle.
β’ Broad visual understanding abilities, like captioning, visual question answering, object detection, scene layout, common sense reasoning, etc.
β’ Audio & speech recognition: wasnβt mentioned in Kosmos-1 paper, but Whisper is already an OpenAI API and should be fairly easy to integrate.
π16β€1
Got ChatGPT to compose a song
Using letters A,B,C,D,E,F,G and symbols # and b (in place of the flat symbol), write a classical [ Piano song. Right hand treble clef only, centered around middle C.
Using letters A,B,C,D,E,F,G and symbols # and b (in place of the flat symbol), write a classical [ Piano song. Right hand treble clef only, centered around middle C.
π6β€1π1