Press "Enter" to skip to content

Microsoft Germany confirms GPT-4 to be multimodal, with potential to operate in multiple input types

Microsoft Germany has confirmed that the upcoming large language model, GPT-4, will be multimodal and will be able to operate across different input types like sound, video, images, and text. This represents a significant advancement from previous models like GPT-3 that only operated in one modality, text. The announcement also highlighted Microsoft’s release of Kosmos-1, a multimodal language model integrating text and images. Microsoft’s technology is expected to have the ability to answer questions in different languages and transcend language barriers, similar to the goal of Google’s multimodal AI technology, MUM. The integration of AI across multiple modalities is a significant development in the field of AI and is expected to bring about new and innovative applications.

GLOSSARY: MUM stands for Multitask Unified Model. It is a new large language model developed by Google that aims to provide a more comprehensive approach to natural language processing by being able to understand and generate text across multiple languages and modalities, such as images and videos. It was announced in 2021 and is still in development.


OpenAI GPT-4 Arriving Mid-March 2023

MUM: A new AI milestone for understanding information 

The Verge: 


Heise Online (German news site): 

Article from The Verge: Microsoft confirms GPT-4 will be ‘multimodal,’ with potential to operate in multiple input types

Article from TechRadar: Microsoft’s GPT-4 to be multimodal and ‘offer completely different possibilities’

Article from Neowin: Microsoft confirms GPT-4 will be multimodal, able to handle video and audio inputs

Wiki Hyphen Website | Updates 11th March 2023 | Link:

Mission News Theme by Compete Themes.