* And incorporating other modalities directly into these models. Indeed, already newer large language models like GPT-4 are now being designed to be “multimodal,” whereby they are also trained in images in addition to text.