- Introduction
- GPT-4 Omni: The Next Level
- Text Generation Capabilities
- Audio Generation Capabilities
- Image Generation Capabilities
- Conclusion
Introduction
I got to say truthfully open AI blew my mind on Monday. I don’t know about you, but their real time companion there, her clone, shocked me to say the least. I want to introduce you to somebody hello. There cutie. What’S your name, little sluff ball! This is Bowser. Well, hello, Bowser! Aren’T you just the most adorable little thing? I did do a full article like recapping the event, but as it turns out, there is a lot more to uncover here than first meets the eye. For example, did you know that this model can somehow generate images and gosh they’re the best AI generated images? I’Ve ever seen Point Blank period, what’s going on, there’s also quite a few other capabilities that open AI, just kind of kept under wraps. So, let’s start out here with what we do know. First of all, obviously we know that the model – that’s powering everything under the hood, this insane realtime AI assistant, is called gp4, o and O stands for Omni and the reason Reon they called it. Omni is because it’s the first truly multimodal AI in simple terms, actually brought to you by GPT 4 itself. Multimodal just means that the AI can understand and generate more than one type of data instead of just working with text. For example, GPT 40 can process images. It can understand audio natively and it can even sort of interpret article. The old gp4 turbo was split into two or three separate models: mod, I’m not precisely sure it might have been taking images in natively or it might have been using a separate model to parse. Those images into text don’t really know either way. We absolutely know for a fact that it did not natively support audio. Yes, the old gp4 app did have the ability for you to talk to it with your voice, but that was using a separate model that was called whisper V3. That would just take your audio and transcribe it into text. Don’T get me wrong. It was great at taking your voice and transcribing it into text, but that is all it did. It can’t hear the sound of birds. For example, it can’t hear your dog barking. It can’t hear your tone of voice. This new model, for example, can understand your breathing patterns and even more, which we’ll get into later, just take a deep breath. I like that suggestion. Let me try a couple deep breaths. Can you give me feedback on my breaths okay here I go whoa slow a bit there mark you’re, not a vacuum, cleaner, breathe in for a count of four okay uh. Let me try again so I’m going to breathe in deeply and then breathe out for four and then exhale, slowly, okay, I’ll, try again breathing in and breathe out. THAT’S it. How do you feel I feel a lot better and, of course it can also understand the emotions that you put behind your words, which is possibly the most important part about this. It will react differently when you’re sad. It will react differently when you’re excited it will react differently when you’re, yelling and screaming at it very human. Indeed like this is Uncharted Territory.
GPT-4 Omni: The Next Level
The first mind blow of capabilities that I want to show you is going to be the text. Generation models have been doing. This for years, so you might think so what it generates text. Even the benchmarks were just as good as the other leading models. It’S not like it’s Leaps and Bounds better, even the context length is the same size. It’S not a bad context length of 128,000 tokens, but it’s no better. So what’s the big deal? Well, here’s the rub on text generation with gp4 Omni. This model is lightning fast and when I say lightning fast, I mean this thing generates like two paragraphs. A second and the outputs, yes, are just as good as leading models multiple times faster, and this opens up entirely brand new branches of what is actually possible. With text generation, so let’s dive into a few of them. So a bunch of these examples are going to come from this Twitter thread by Min Choy. That’S going to be linked down below. I always link Twitter threads down below. If you want to check them out highly recommend following this guy by the way, phenomenal AI account and also follow me on Twitter as well cuz, I am always reposting great stuff. So first up this is Sawyer. Hood’S ultimate llm test. Ask it to make a Facebook Messenger as a single HTML file. Gpt 40. Does this all in 6 seconds flat again, not only fast text generation but high quality? It actually works. You open up Facebook Messenger. As a single HTML, I mean that’s just absolutely insane right. Gp4 Omni can also generate fully blown charts in statistical analysis from spreadsheets, with a single prompt in less than 30 seconds. Zay here points out that this stuff used to take absolute ages in Excel, but it can now all be done automatically by your AI and yes, the old gp4 turbo could absolutely do this, but it couldn’t do it this quickly and also it wasn’t able to do It this accurately either yeah. You start getting charts in about 6 seconds from an actual shoe company sales CSV file and these charts aren’t bad either they’re. Actually, what I would consider to be usable in a real company meeting and they’re diverse, even giving you a summary with key insights. It’S like an entire breakdown in 20 seconds, fast highquality generation. This is Leaps and Bounds ahead.
Text Generation Capabilities
Oh and folks, you thought we were done there. Well, it gets even crazier. This is from tailin on Twitter, Pokemon, Red gameplay. So essentially, this is like a custom prompt to make gp4 Omni play Pokemon red as a text based game watch this, as you can see it essentially boots up Pokemon Red there. Look at this new game continue or options. It’S a text based game. It even does its best to try to include pictures by using emojis, but it can do it so fast that you can essentially play the game in real time. Oh we select a and then it says, oh, you know some people, Pokemon are pets, other use them. As fights it’s literally the Pokemon Red game – and you just keep entering your a choice and then you can actually put your name in we’re literally just going to use a custom name in this example. And it’s like okay yep following along here the whole Pokemon Red game is converted into a text based Adventure game like that inside of the llm, and it’s running in real time like what the what is going on here. It even has Route One all laid out correctly with the houses Oaks lab the beach. This is indeed a very very impressive example. You can see it even has the fight or use item, and you can have the HP. You can essentially play an entire Pokemon Red game, just conver Ed, to text based inside of an AI, with just a little bit of prompting, which is absolutely mindblowing. I mean this is more or less what’s possible with the API, I’m sure you could get chant GPT to do this if you with a special prompt or with a custom GPT, but obviously this here was done by using the API instead and I think that’s what You guys have to realize here is that this is more than just chat. Gpt people are going to be able to build some insane things. Imagine a new from the ground up game that lets. You take a photo of your dog and then use your dog as the Pokemon and the AI comes up with all of its abilities. On the Fly I mean the possibilities are endless and by the way guys, this is merely just the beginning. How good would these models be in a year? Imagine when the text generation isn’t just way faster and just as good, but way better and also way faster. The era of Rapid AI development is upon us, oh and by the way, speaking on the API, the new gp4 Omni is not only fast and just as good, but it’s actually uh half as cheap as GPT 4 Turbo, which was even cheaper than the original GPT. 4, so we’re seeing a rapid decrease in how much it costs to actually run these powerful models and folks, that’s just text.
Audio Generation Capabilities
Let’S get into the audio generation capabilities that gp4 Omni holds now we’re dipping our toes into the multimodal landscape, again Uncharted Territory for sure, as we saw in the demo, it produces remarkably high quality human, sound, ing audio. The model is able to generate voice in a variety of different, emotive Styles: hey chachu PT. How are you doing I’m doing fantastic thanks for asking how about you and uh oh, it’s a little subtle difference there, but it’s slightly different. Even speaking in the second person and making sounds like the AI can understand emotion within the tone of your voice and even replicate it like watch this. I’M going to demonstrate a real time example for you. Here’s a piece of audio just to test the model for you right now watch this: what’S up there are you in a really great mood or something because I can hear the way you sound? Yeah, I mean this is this is the best audio generation I’ve ever seen, and you can totally use it to make realistic voices that are almost indistinguishable from actual humans. They do say it’s capable of some sort of audio manipulation, for example, able to mimic even accents, or you can even tune the AI to speak in a different accent than you or even a variety of different tonalities and yes, that’s a very good thing for the future of this kind of stuff, but it also opens up entirely new branches of how audio generation can be used in the AI space. This would be very good to practice learning a new language. I mean the first words spoken in an accent are actually easier to catch on to if you hear it in a realistic accent. I have already mentioned the whisper V3 before that was, of course, used to transcribe audio into text, but the audio transcription capabilities are now built into this. Model allows for more seamless. Once you could directly make phone calls, you could chat with someone in real time without them knowing that it’s an AI. Of course, the full extent of these abilities hasn’t been fully showcased, but the directions we’re heading towards right now are quite remarkable and oh boy.
Image Generation Capabilities
One of the most insane features about gp4 Omni is the image generation. I don’T even know how to explain this one. I’Ll be honest. This part is a bit of a brain, drain. So I’m just going to show you some examples of images that were generated by gp4 Omni and you’ll understand for yourself. So this is from an article by The Verge, and this is a mixture of input prompts and then images that it generates for you and it works very similarly to DALL·E. The entire generation is just beyond what any model has ever done before. I mean, the details here are mind blowing. Look at the amazing detail on the deer here and also just the incredible beauty of this entire scene. This one here is about the best. I have ever seen for a futuristic science, fiction, theme and the most astounding thing is that these images are not only at the level of quality that you would expect to see from a high end designer, but they’re generated in under 20 seconds and they have full captions and descriptions of what they are. You want a detailed illustration of a scene that you have in your mind. You want a digital art. Piece that perfectly embodies the surrealist elements of your imagination? You want a photograph that captures a moment in time like you were there to witness it. This model does that in seconds, and this level of detail in such a short time span has never been possible before. These are absolute works of art and they’re all done by a computer like I can’t even comprehend it. I mean this is unbelievable stuff and even more so in the future. I see this being used to generate unique assets and game development. Movie production even generate concept art and things of that nature for new works in the visual arts. Everything that we know about how these models can generate art has been turned on its head. It’s like they’ve cracked the code and this is going to revolutionize everything in the creative arts. This is where it gets to truly mind blowing levels.
Conclusion
Folks, there you have it! The new capabilities of GPT-4 Omni are absolutely revolutionary. This multimodal model is changing the landscape of AI and setting new benchmarks for what’s possible with text, audio, and image generation. The sheer speed and accuracy with which it operates are game-changing, and its ability to understand and generate across different modalities opens up exciting possibilities for the future. Whether you’re interested in text generation, audio synthesis, or creating stunning images, GPT-4 Omni is leading the way. Buckle up, because the future of AI is here, and it’s more incredible than ever.
Tags: chat gpt, gpt-4o, gpt4, gpt40 demo, gpt4o, gpt4o chatgpt, mattvidpro, mattvidpro ai, new ai tools, open ai gpt4o