Generating video from text

Sora is an AI system designed to generate both realistic and imaginative scenarios based on written prompts.

Sora can generate complex scenes with multiple characters, specific movements, and detailed elements in both the subject and background. The model not only comprehends the user's requests but also understands how these components interact in a real-world context.

Prompt: A drone soars around a stunning historic church perched on a rugged cliff along the Amalfi Coast. The view highlights the church's intricate architecture, with layered pathways and terraces cascading down the cliffside. Below, waves crash dramatically against the rocks, framing the coastal waters and the rolling hills of Amalfi, Italy. In the background, visitors stroll along the terraces, taking in the breathtaking ocean views. The warm afternoon sunlight casts a romantic and enchanting glow over the entire scene, beautifully captured in the photograph.

Prompt: In Tokyo, a fashionable woman strides down a bustling street illuminated by vibrant neon signs and lively city lights. She exudes style in a black leather jacket, a flowing red dress, and sleek black boots, complemented by a black purse. With sunglasses and bold red lipstick, she walks with confidence and poise. The wet pavement reflects the bright colors, enhancing the energetic atmosphere filled with people.

Crafting Captivating Visuals

Sora's sophisticated language comprehension allows it to precisely interpret prompts, bringing to life compelling characters rich with vivid emotions. Moreover, Sora can seamlessly generate multiple scenes within a single video, maintaining consistency in both character portrayal and visual style.

Safety

It seems you're taking a comprehensive approach to AI development and deployment, with a strong focus on safety and ethics. Here’s a rephrased summary:

You’re collaborating with experts in areas like misinformation, harmful content, and bias to rigorously test the AI model.

You’re also developing tools, such as a detection classifier to identify AI-generated content, and planning to implement C2PA metadata for enhanced transparency and accountability.

Safety measures initially created for DALL·E 3 are being adapted for Sora, including text classifiers to filter inappropriate prompts and image classifiers to ensure adherence to content policies.

You’re actively engaging with policymakers, educators, and artists around the world to address their concerns and explore the positive applications of the technology.

Despite extensive research and testing, you understand the importance of learning from real-world use cases to continually improve AI safety.

This strategy is designed to responsibly deploy AI technologies, balancing innovation with the management of potential risks.




Prompt: Archaeologists carefully unearth a common plastic chair in the desert, treating the discovery with meticulous attention as they gently excavate and dust it with great care.

Prompt: A litter of golden retriever puppies frolics in the snow, their heads peeking out with their fur dusted in snowflakes.




Sora is engineered to create detailed scenes with multiple characters, specific movements, and precise details in both the foreground and background. The model interprets user prompts and translates them into realistic depictions of how elements interact in the physical world.

Currently, Sora is undergoing testing by red teamers to identify potential risks and issues. We are also working with visual artists, designers, and filmmakers to refine the model based on their feedback.

We are sharing our research progress early to engage with external experts and inform the public about Sora's capabilities.

Our goal is to develop AI that accurately understands and simulates real-world motion, ultimately creating models that address real-world challenges.

Introducing Sora: our text-to-video model that can produce videos up to one minute long while maintaining high visual quality and staying true to user prompts.



Prompt: The camera is positioned directly in front of colorful buildings in Burano, Italy. An adorable Dalmatian peeks out from a ground-floor window. People are walking and cycling along the canal streets in front of the buildings.

Prompt: A cat wakes its sleeping owner, persistently demanding breakfast. Despite the owner's attempts to ignore the cat, the feline uses various tactics to get attention. Eventually, the owner pulls out a hidden stash of treats from under the pillow to briefly appease the cat.

Investigating Research Approaches

Sora is a diffusion model that generates videos by starting with a noise-like initial state and progressively refining it through multiple stages to remove the noise.

Like GPT models, Sora utilizes a transformer architecture, enabling exceptional scalability.

© 2024 Sora - All Rights Reserved