OpenAI Now Has A Text-To-Video AI Tool, Sora

OpenAI has recently announced its latest innovation, Sora, a text-to-video modela text-to-video model designed to understand and simulate the physical world in motion.

Sora is a milestone in AI development, capable of generating videos up to a minute long that adhere closely to user prompts while maintaining high visual quality.

This new model can produce videos lasting up to a minute, closely adhering to the user’s instructions while maintaining high visual fidelity.

“We’re teaching AI to understand and simulate the physical world in motion,” OpenAI states, which shows Sora’s possible plans to completely change real-world interaction through AI.

Sam Altman, CEO of OpenAI, has been actively showcasing Sora’s capabilities, inviting users to submit video captions to demonstrate the model’s ability to generate complex scenes with accurate details.

Sora employs a diffusion model approach, starting with static-like noise and progressively refining it into a coherent video.

This method allows for the creation of entire videos at once or the extension of existing ones, ensuring consistent subjects even when they temporarily leave the view.

The model’s architecture, inspired by transformers used in GPT models, enables scaling and dealing with various visual data types, including different durations, resolutions, and aspect ratios.

 

 

Sora’s Capabilities and Research

 

Sora has been built on the foundation laid by previous research in DALL·E and GPT modelsDALL·E and GPT models, incorporating the recaptioning technique from DALL·E 3 to generate descriptive captions for training data.

This allows Sora to produce videos that closely follow text instructions, animate still images, extend videos, and fill in missing frames with remarkable accuracy.

OpenAI’s objective to advancing this model is evident in its invitation to visual artists, designers, and filmmakers to provide feedback, aiming to refine Sora for creative professional use.

However, the model isn’t without its limitations. It struggles with simulating complex physics accurately and understanding specific cause-and-effect scenarios, such as leaving a mark on a cookie after a bite. OpenAI acknowledges these weaknesses and is actively working on improvements.

 

Prioritising Safety

 

OpenAI is taking serious safety measures before making Sora widely available. The model is undergoing adversarial testing by red teamers, domain experts in misinformation, hateful content, and biashateful content, and bias, to identify potential harms or risks.

OpenAI is also developing tools to detect misleading content and plans to include C2PA metadata in future deployments to enhance safety.

The existing safety methods developed for DALL·E 3, such as text and image classifiers to review and reject inappropriate content, will be applied to Sora.

OpenAI is engaging with policymakers, educators, and artists worldwide to understand concerns and identify positive use cases for this technology, which just further emphasises the importance of real-world learning in creating safe AI systems.

 

Sephora’s Future In AI

 

While Sora is currently available to a select group of red teamers and creative professionals for testing and feedback, there’s no definitive word on when it will be accessible to the broader public or the associated costs.