OpenAI teased a text-to-video AI generator that’s capable of creating incredibly detailed and realistic videos based off of text prompts on Thursday. The model, called Sora, can create videos up to 60 seconds long and is currently being tested with OpenAI’s risk assessment team along with “a number of visual artists, designers, and filmmakers” before an eventual launch to the wider public, according to the announcement.
“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” OpenAI said. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”
The company has remained predictably silent on what their dataset contains and what the model was trained on. However, they did note that it was made using a similar process to DALL-E 3, “which involves generating highly descriptive captions for the visual training data.”
Along with being able to generate video via text, it can also create a video from an “existing still image,” and even an existing video in order to “extend it or fill in missing frames.” This creates a lot more potential use cases for the model, which means it can be used for anything from restoring old footage, to creating cheap video content, to ushering in a new era of propaganda and disinformation the likes of which the world has never seen before.
In various demos, OpenAI shows that Sora is capable of creating high-definition videos of wooly mammoths galloping through a snowy landscape, a movie trailer for a film about a 30-year-old astronaut shot on 35mm film, Pixar-like animations of cute monsters, and historical footage of California during the Gold Rush.
It doesn’t just feel more advanced than current text-to-video generators but rather lightyears ahead of everyone else. Meta released their own video generator in 2022 that was impressive at the time, but now just looks downright archaic in comparison. Similarly, Google released a text-to-video model in Jan. 2024 but it’s also not as detailed and realistic as OpenAI’s Sora.
The stunning accuracy of Sora only underscores the incredibly rapid advances in the world of generative AI that we’ve seen in the past two years—and also their dangers. The world is already struggling to grapple with the impact that these models have on disinformation and social engineering. For example, studies have shown that AI deepfakes can be incredibly effective in swaying people’s opinions and perceptions and are even capable of creating false memories. Meanwhile, Congress is still slow to adopt regulation in order to rein in the worst of its impact.
As we enter yet another hotly contentious election year amidst the geopolitical turmoils of Russia’s invasion of Ukraine and Israel’s war in Gaza, the dangers and risks posed by this technology are far-reaching. There’s no telling what nation states, terrorist organizations, and political campaigns can weaponize these models for—and the danger isn’t just limited to bad actors either.
Technology like Sora holds the potential to not just disrupt industries like art and cinema, but completely obliterate them. No longer will production companies need to rely on actors, camera operators, gaffers, and the hundreds of other people who create the movies and TV shows we love. Instead, they can just type a few words into a prompt and get a full video.
Of course, there are very strong arguments to be made that there will always be a need for a human when it comes to creating good art like cinema. But those words might fall on deaf ears when it comes to producers and studios looking to make a movie as cheaply as possible.
And on top of all of this are the perennial questions of how exactly the model was trained—and whose data was used to train it? Sora wasn’t created in a vacuum. The model required a massive corpus of image, video, and text data in order to create. All of that data likely came from artists, writers, and creators that did not give informed consent to have their work be a part of a dataset to train an AI that will likely push them out of jobs.
The videos are impressive, but utterly terrifying when you stop to consider the model’s implications. This is just another example of how AI threatens so many people’s lives and livelihoods—and perhaps most terrifying of all, OpenAI isn’t done yet.