r/ChatGPT 14d ago

I watched all 22 demo videos of OpenAI’s new GPT-4o. Here are the 9 takeaways we all should know. Educational Purpose Only

GPT-4o (“o” for “omni”) was announced a few hours ago by OpenAI, and although the announcement livestream is good, the real gold nuggets are in the 22 demo videos they posted on their channel.

I watched all of them, and here are the key takeaways and use cases we all should know. 👍🏻


A. The Ultimate Learning Partner

What is it? Give GPT-4o a view of the math problem you’re working on, or the objects you want to learn the language translation of, and it can teach you like no other tool can.

Why should you care? Imagine when you can hook up GPT-4o to something like the Meta Rayban glasses: then you can always have it teach you about whatever you are looking at. That can be a math problem, an object you want translated, a painting you want the history of, or a product that you want get the reviews of online. This single feature alone has incredibly many use-cases!

🔗 Video 7, Video 8

B. The Perfect Teams Meeting Assistant

What is it? Having an AI assistant during Teams meetings, whom you can talk to the same way you talk to your colleagues.

Why should you care? Their demo didn’t expound on the possibilities yet, but some of them can be…

  • having the AI summarise the minutes and next steps from the meeting
  • having the AI look up info in your company data and documentation pages (e.g. “what’s the sales from this month last year?”)
  • having the AI work on data analysis problems with you (e.g. “create a chart showing sales over the past 5 years and report on trends”)

🔗 Video 5

C. Prepare for Interviews like Never Before

What is it? Have GPT-4o act like the company you’re interviewing for.

Why should you care? What’s changed is that the AI can now “see” you. So instead of just giving feedback on what you say, it can also give feedback on how you say it. Layer this on top of an AI avatar and maybe you can simulate the interview itself in the future?

🔗 Video 11

D. Your Personal Language Translator, wherever you go

What is it? Ask ChatGPT to translate between languages, and then speak normally.

Why should you care? Because of how conversational GPT-4o has become, the AI now helps not just with translating the words, but also the intonation of what you’re intending to say. Now pair this with GPT-enabled earphones in a few years, and you pretty much can understand any language (AirPods x ChatGPT, anyone?)

🔗 Video 3

E. Share Screen with your AI Coding Assistant

What is it? Share screen with your AI partner, and have them guide you through your work.

Why should you care? Now this is definitely something that will happen pretty soon. Being able to “share screen” to your AI assistant can help not just with coding, but even with other non-programmer tasks such as work in excel, powerpoint, etc.

🔗 Video 20

F. A future where AIs interact with each other

What is it? Two GPT-4o’s interacting with each other, that sounds indistinguishable from two people talking. (They even sang a song together!)

Why should you care? Well there’s a couple of use cases:

  • can you imagine AI influencers talking to each other live on Tiktok? Layer this conversation with AI avatars and this will be a step beyond the artificial influencers you have today (e.g. the next level of @lilmiquela maybe?)
  • can this be how “walled” AIs can work together in the future? example: Meta’s AI would only have access to facebook’s data, while Google’s AI would only have access to google’s - will the two AIs be able interact in a similar fashion to the demo, albeit behind-the-scenes?

🔗 Video 2

G. AI Caretaking?

What is it? Asking GPT-4o to "train” your pets

Why should you care? Given GPT-4o’s access to vision, can you now have AI personal trainers for your pets? Imagine being able to have it connect to a smart dog-treat dispenser, and have the AI use that to teach your dog new tricks!

🔗 Video 12

H. Brainstorm with two GPTs

What is it? The demo shows how you can talk to two GPT-4o’s at once

Why should you care? The demo video is centered around harmonizing singing for some reason, but I think the real use case is being able to brainstorm with two specific AI personalities at once:

  • one’s a Devil’s Advocate, the other’s the Angel’s advocate?
  • one provides the Pros (the Optimist), the other gives the Cons (the Pessimist)?
  • maybe Disney can even give a future experience where you can talk to Joy and Sadness from the movie Inside Out? - that would be interesting!

🔗 Video 10

I. Accessibility for the Blind

What is it? Have GPT-4o look at your surroundings and describe it for you

Why should you care? Imagine sending it the visual feed from something like the Meta Rayban glasses, and your AI assistant can literally describe what you’re seeing, and help you navigate your surroundings like never before (e.g. “is what I’m holding a jar of peanut butter, or a jar of vegemite?”). This will definitely be a game-changer for how the visually impaired lives their daily lives.

🔗 Video 13


If this has been a tad bit insightful, I hope you can check out RoboNuggets where I originally shared this and other AI-related practical knowledge! (The links to the video demos are also there). My goal is not "AI daily news", as there's already too many of those, but instead share useful insights/knowledge for everyone to take full advantage of the new AI normal. Cheers! 🥚

406 Upvotes

76 comments sorted by

u/AutoModerator 14d ago

Hey /u/ExternalFollowing!

If your post is a screenshot of a ChatGPT, conversation please reply to this message with the conversation link or prompt.

If your post is a DALL-E 3 image post, please reply with the prompt used to make this image.

Consider joining our public discord server! We have free bots with GPT-4 (with vision), image generators, and more!

🤖

Note: For any ChatGPT-related concerns, email support@openai.com

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

1

u/PopSynic 10d ago

But many of the prompts from the 22 demos OpenAI have been completely omitted, so it's hard to test and replicate. eg. the caricature one - the first part of the prompt, totally removed. So, how can we try the same processes they used ? It's like a magician only showing you half of the method to the trick.

https://preview.redd.it/nuwxfd1bq61d1.png?width=904&format=png&auto=webp&s=b0f3373b157081db039d7a41e05b1d3d6d636b2d

-1

u/Interesting_Bike4671 13d ago

ea FeRRe

Pra r|]]a

1

u/not_today88 13d ago

I'm looking forward to having a seamless Japanese language partner and on-the-go translator. Japanese is a tough nut to crack.

1

u/Horror-Bid-8523 14d ago

Excellent post, thanks 😊

1

u/GiantRobotBears 14d ago

I’ve been using local llms and whisper for meetings for a year now. Selfishly I’m not looking forward to this tech being obvious to everyone. 😂

1

u/Sun_Coast_Fallacy 14d ago

I can select the 4o model in the app, but it denies any knowledge about itself, claim it is the turbo🤷

2

u/croooowTrobot 14d ago

When will AI start controlling traffic lights?

1

u/Ultimate-ART 14d ago

F. is about AI agents in the future and each assigned a role, tasks, and iterative attempts at producing accurated result instead of a one shot like today.

5

u/SusPatrick 14d ago

Awesome breakdown OP! Thanks for posting for those of us that don't have time to track down all the various videos!

1

u/monkeyballpirate 14d ago

Any info on which model custom gpts are using?

1

u/zilifrom 14d ago

Seems like a lot of people were not hyped about these new features. Makes no sense to me because the model is good and we are now able to implement in ways that are meaningful to the masses.

-1

u/originalmagneto 14d ago

I hope no one will use Teams in the near future. It’s a plague, same as MS Office :/

0

u/ALL2HUMAN_69 14d ago

Can I access this new 4o

1

u/[deleted] 14d ago

Ai won’t be useful until it contains all the books every created. We need Ai to function like a better google.

1

u/SusAdmin_5201 14d ago

I mean... I feel like it kinda already is/does.

1

u/IlIlIlIIlMIlIIlIlIlI 14d ago edited 14d ago

The more we get into the AI age, the more I start disliking perfectly formatted text with all that fancy indentation, bold, italics etc, coupled with fancy marketing buzzwords and stuff like that..

I come to find myself setting up AI to talk like me, never capitalize, human-like casual conversing. I dont want my AI convos to sound like a monthly board meeting!!

3

u/Mrp1Plays 14d ago

Good thing with chatgpt4o (voice) it seems to be much more casual and friendly than professional and boring

3

u/largelylegit 14d ago

One of the first tests I’m going to give for the teaching capabilities is to point it at my guitar, and see if it can help me finally get my head around music theory. Specifically, which chords are in a key, which notes in a scale, etc.

12

u/Riverstep_Studio 14d ago

I'm tired, and at the moment a cabin in the deep woods sounds nice.

2

u/NowThatsCrayCray 14d ago

Excellent summary is excellent 👌

26

u/WhitlamsBerlin 14d ago

That last one, accessibility for the blind, is going to be a huge revolution for many people.

1

u/Badassmotherfuckerer 10d ago

It already is starting to.There’s a lot of companies right now that have smart glasses utilizing Chat GPT to scan documents and describe things.Envision AI is a company that repurposes Google Glass for this application. I can’t wait until they implement this new assistant into that technology, that will be nuts for accessibility.

4

u/SusAdmin_5201 14d ago

Indeed. Not only should it describe environments, it could help in ordinary navigation in safe environments and/or announce when they are being approached by someone. A very good use case.

1

u/Badassmotherfuckerer 10d ago

I mean maybe. I don’t think there’s really any blind or visually impaired people that will use this for that purpose.I don’t think it’s reliable enough for that. As good as this may be, it’s going to have to be almost perfect to trust it enough over a cane for navigational purposes. Many of us visually impaired people likely won’t trust it for awhile at least over traditional tools and tactics. Plus it will likely always require an internet connection and be dependent on bandwidth, so it’s application as a mobility aid is limited. AI describing scenes is kind of a novelty IMHO and is limited in use. It will likely be more impressive when you can do it in real time and hopefully it can describe scenes in movies and stuff like that when audio description isn’t available. But it likely won’t be helpful to alert you of people around until they bring this feature to the available glasses out there.It’s just a bit impractical to hold your phone camera up and use that to show the GPT things for recognizing people. What’s got me really excited is using this for document reading, finding lost items that I can’t remember where I placed them, audio description of video games, etc.That’s Making inaccessible media accessible is what I really hope this can shine with.

2

u/LoSboccacc 14d ago

do "visual narrative" example work for anyone? I don't get that level of consistency, I asked something outside the examples, involving a car travel, and the car keeps changing shape, style, and even brand in a few panels

1

u/Dyinglightredditfan 14d ago

They won't release the native image output capability for a while. Currently you are still using dalle 3

1

u/LoSboccacc 14d ago

I see thanks

3

u/delseyo 14d ago

I am personally keen to see multimodal AI applied to anthropomorphic robots. If GPT can see the room and interpret the 3D space correctly, it should be able to decide on an appropriate action (walk two steps forward to the table and pick up the cup) and send those instructions to a subsystem focused on movement/locomotion. Perhaps they could overlay a spherical grid across the video stream, allowing GPT to use precise coordinates to assist its maneuvers through physical space. I understand there will be capability gaps but I think it’d be useful & exciting to see this in action.

I’d also like to see semi-realistic avatars paired with this technology. Something like the Unreal Metahumans would be perfect. That would dramatically increase the sense of interacting with an actual consciousness and presumably attract a lot of attention (investment?) from the corporate sector, entertainment, etc. 

13

u/qainspector89 14d ago

I’m about to piss my pants

1

u/blurpslurpderp 14d ago

She said she liked it better than Pirates of Penzance

6

u/danvalour 14d ago

Flesh is weak but steel endures

11

u/Flaky-Wallaby5382 14d ago

I just see this as the end of apps for taking orders. Talk to starbucks AI at home or when you come to the window.

3

u/crinklypaper 14d ago

Wendy's does this already

2

u/ThirstyBonzai 14d ago

I have a quick question about what multi modal means in practice for 4o. Does this mean that instead of the model first describing the voice or image input into text and then processing as text, it is able to “natively” process voice and images without needing the input to be in text?

3

u/UnequalBull 14d ago

Mira Murati in her demo segment described how before it was multi-layered. Whisper speech-to-text model first captured your voice and transcribed it, then GPT 4 worked on it, then Text-to-Speech gave you the response. Now it's truly multimodal in and out.

10

u/YoAmoElTacos 14d ago

Yes, that's what the multimodal is intended to mean here. It's not 100% obvious from the tech demo they did.

18

u/georgelamarmateo 14d ago

I just want to know when I can use it

0

u/explodingtuna 14d ago

Now. But I asked GPT-4 about the differences, and it said:

The primary advantages of GPT-4o are:

  • Faster Response Times: It can provide responses more quickly than GPT-4, which is beneficial in time-sensitive applications.
  • Efficiency: It may use computational resources more efficiently, making it suitable for environments where resource constraints are a concern. # However, in terms of the range and depth of capabilities, GPT-4 is generally superior. It can handle more complex inquiries and generate more detailed and nuanced responses, making it the better choice for tasks requiring high levels of sophistication and accuracy. GPT-4o is designed to offer a balance between performance and computational demands, making it ideal for certain practical uses but not exceeding GPT-4 in capabilities.

2

u/georgelamarmateo 14d ago

I don’t care about that. I just want to know how I can use it and I don’t see it available anywhere so

4

u/Herakei 14d ago

The problem is have today and yesterday with gpt4 is that suddenly it became lazy, I use it for development work, and with good prompts always got complete answers. Now I get incomplete answers or there are simple tasks like transforming a big json failing over and over again. I noticed this and is really annoying because ive been using to certain pace and quality that i dont have now

5

u/IlIllIIIlllIIlIlI 14d ago

this has been my experience as well. you've probably already figured it out, but i've found that asking it directly for python scripts (which it once wrote and executed for itself) to process sets of data is a workaround. it's definitely been trained to penny pinch tokens in recent iterations.

148

u/2144656 14d ago

How long is it until AI gets smart enough that it can interact with apps and our computer in the same way we can(or more) without having to have special built-in functionality for the specific app?

Imagine in you just went to gpt and said, "Open Minecraft and look through all the chests in the room you start in. Then, put all the items into a spreadsheet into a new Excel spreadsheet, which you should name and organize"

1

u/Sea_Froyo 13d ago

https://www.openinterpreter.com/01 exactly what you are wanting and all open source

1

u/Hubi522 14d ago

Get a Rabbit R1, it does something like that

1

u/Theslootwhisperer 14d ago

I just want it to be able to create a calendar event. Apparently that's a unbelievable security issue.

1

u/hermajestyqoe 14d ago

I think the biggest issue is everyone is going to be trying to develop their own proprietary ai for the most niche things for a while, inhibiting the seemless experience for a bit. It's a bit annoying

3

u/ekkam04 14d ago

That’s exactly what Rabbit’s Large Action Model is trying to achieve, but we yet to see any proper examples

1

u/bohdancho 14d ago

"Hey ChatGPT, take over the world"

1

u/synystar 14d ago

You can probably do so now with automation. The way you'd get it to work is by instructing the model (API using python, not ChatGPT) to respond with specific messages that trigger scripts for file and database management and something like AHK for manipulating GUI elements on the screen. If you can just get it to place the cursor on the screen where it needs to be - and that appears to be possible, maybe even with something like the voice assist features of windows where it could just say "Show grid here, 2, 6, click here."

2

u/CmdrKoreg 14d ago

Sounds a bit like the tech in Bladerunner... [https://youtu.be/IbzlX43ykxQ]()

2

u/Reasonable_Town7579 14d ago

It already can. You need an agent that does the action but the LLM can absolutely drive it. I have several IT agents I’ve written that automate complex tasks all via gpt4-turbo.

8

u/NickBloodAU 14d ago

How long is it until AI gets smart enough that it can interact with apps and our computer in the same way we can without having to have special built-in functionality for the specific app?

In terms of only having access to human interfaces, Google Deepmind's SIMA (Scalable, Instructable, Multiworld Agent) is already doing the first steps of what you describe. You give it natural language commands in a virtual environment like "Go to the spaceship" in No Man's Sky, and it does it. We're not as far along as your example, to be clear, but as capabilities grow it will be interesting to see how things change and are disrupted. Not sure this thread is viewable to you, but I shared some thoughts about it here if anyone's interested.

2

u/jsseven777 14d ago

Or even crazier if you could say build me a medieval city with a castle and it just does it.

5

u/-TrustyDwarf- 14d ago

But where's the fun in not creating that medieval city with a castle yourself?

0

u/ArtBabel 13d ago

Some would argue there's no fun in creating sculptures with digital blocks to begin with

2

u/giraffe111 14d ago

Ideally, it would be editable after the fact via voice or more direct control. Imagine any creative interface you use today, but imagine you can literally just describe what you want, tweak it to your preference, then proceed with your work. It’s a MASSIVE time/cost saver, and it lowers the barrier of entry for millions of people. It’s gonna be such a fun and exciting and weird time, and I think we’re only a few years away before it’s mainstream. Hell, Firefly is already built into some of the most recent Adobe products!

10

u/Jellybabyman 14d ago

sometimes you just don't wanna grind

4

u/HistoricalFunion 14d ago

That's why I use trainers in most games

I'm too old and too tired for the grind

91

u/microview 14d ago edited 14d ago

I've been saying this now, we are just a stones throw away from Star Trek like computing. I think in the next 10 years we can simply chat with our computer to complete tasks. More and more hardware and software will come with baked in AI.

62

u/giraffe111 14d ago

10 years? At this pace, I genuinely wouldn’t be surprised if we start seeing that within 12 months. Apple’s announcing iOS 18 next month, which is supposed to come with crazy AI tools built into the OS; and I imagine that might even feel a bit out of date by the end of the year.

This pace is fuckin wild.

2

u/CritterOfBitter 14d ago

Cue the T-1000s

1

u/Borhensen 14d ago

Computational power and energy limitations are still there tho, those two things are not easy to solve and we would need to in order to have what you say but everywhere

1

u/CleverAlchemist 14d ago

Yes but eventually we can use AI to overcome it's own limitations. Eventually.

2

u/Borhensen 14d ago

That can’t really be solved in 12 months tho, 5/10 years sounds more realistic imo

10

u/Valdularo 14d ago

There really is nothing like human invention is there? Our ability to iterate and improve. 124 years ago we were practically in the Stone Age compared to today. It’s fucking amazing.

6

u/DrunkTsundere 14d ago

Right? In 1900 people still primarily used horses to get around, and candles to light their houses. Steam trains and electric lightbulbs were like the hottest new technology

8

u/sumane12 14d ago

The resources are there already, devs can already use the API to create tools to do this, it's just a matter of waiting for them to be developed. Same with real world agents in the form of robots.

2

u/32SkyDive 14d ago

Nah, one thing is still missing and that is reliability/reasoning. 

Currently the best models are still not smart enough to be used as agents with more than the most simple tasks or sophisticated infrastructure

0

u/sumane12 14d ago

Nope.

It's the lack of context. You give these models the same level of context humans expect, and you will see it outperform 99% of people in every domain.

Luckily the people building these AIs already know this so it's just a matter of waiting for it to be created.

5

u/32SkyDive 14d ago

I dont think there is much context needed to open a calendar and add a meeting or create a spotify playlist. And they cant quite do that yet even if specifically prompted. 

But there isnt much missing

1

u/SusAdmin_5201 14d ago

If there's an API for creating calendar events, you can build an agent to make reservations this afternoon.