OpenAI has recently announced its latest innovation, Sora, a text-to-video modela text-to-video model designed to understand and simulate the physical world in motion.
Sora is a milestone in AI development, capable of generating videos up to a minute long that adhere closely to user prompts while maintaining high visual quality.
This new model can produce videos lasting up to a minute, closely adhering to the user’s instructions while maintaining high visual fidelity.
“We’re teaching AI to understand and simulate the physical world in motion,” OpenAI states, which shows Sora’s possible plans to completely change real-world interaction through AI.
Sam Altman, CEO of OpenAI, has been actively showcasing Sora’s capabilities, inviting users to submit video captions to demonstrate the model’s ability to generate complex scenes with accurate details.
Sora employs a diffusion model approach, starting with static-like noise and progressively refining it into a coherent video.
This method allows for the creation of entire videos at once or the extension of existing ones, ensuring consistent subjects even when they temporarily leave the view.
The model’s architecture, inspired by transformers used in GPT models, enables scaling and dealing with various visual data types, including different durations, resolutions, and aspect ratios.
More from News
- Confidence Gap Holds Back Gen Z Business Founders, Report Finds
- Google To Pay $68m After Lawsuit Over Google Assistant Recordings
- Big Tech Paid $7.8bn In Fines Last Year, Report Finds
- New Research Reveals What (Or Who) The Actual Risks To Businesses Are In The AI Age
- UK Government Invests £36m To Make Supercomputing Centre 6 Times More Powerful
- Experts Share: How Will The 2026 Global Memory Shortage And GPU Rise Impact Industries?
- Golden Visas Used To Be A Pathway To The World’s Best Passports…But Not Anymore
- Researchers Are Reporting ChatGPT Using Grokipedia Answers
Sora’s Capabilities and Research
Sora has been built on the foundation laid by previous research in DALL·E and GPT modelsDALL·E and GPT models, incorporating the recaptioning technique from DALL·E 3 to generate descriptive captions for training data.
This allows Sora to produce videos that closely follow text instructions, animate still images, extend videos, and fill in missing frames with remarkable accuracy.
OpenAI’s objective to advancing this model is evident in its invitation to visual artists, designers, and filmmakers to provide feedback, aiming to refine Sora for creative professional use.
However, the model isn’t without its limitations. It struggles with simulating complex physics accurately and understanding specific cause-and-effect scenarios, such as leaving a mark on a cookie after a bite. OpenAI acknowledges these weaknesses and is actively working on improvements.
Prioritising Safety
OpenAI is taking serious safety measures before making Sora widely available. The model is undergoing adversarial testing by red teamers, domain experts in misinformation, hateful content, and biashateful content, and bias, to identify potential harms or risks.
OpenAI is also developing tools to detect misleading content and plans to include C2PA metadata in future deployments to enhance safety.
The existing safety methods developed for DALL·E 3, such as text and image classifiers to review and reject inappropriate content, will be applied to Sora.
OpenAI is engaging with policymakers, educators, and artists worldwide to understand concerns and identify positive use cases for this technology, which just further emphasises the importance of real-world learning in creating safe AI systems.
Sephora’s Future In AI
While Sora is currently available to a select group of red teamers and creative professionals for testing and feedback, there’s no definitive word on when it will be accessible to the broader public or the associated costs.