How is YouTube Using Artificial Intelligence On Shorts?

YouTube is making some adjustments and additions in how people create Shorts. The company is bringing in new tools powered by generative AI while also finding ways to run them on phones without slowing them down. Engineers and product leads at YouTube and Google Cloud have shared how the technology works and what creators can now do with it.

Sarah Ali, YouTube’s Vice President of Product Management for Shorts, announced at the end of July that creators are getting a new “Photo to video” feature. This lets a single photo from a camera roll turn into a moving clip. Users can add movement to a landscape shot, make group photos feel alive or even animate casual snaps. The feature is rolling out first in the United States, Canada, Australia, and New Zealand, with more regions planned later in the year.

Generative effects are also becoming a big part of Shorts. These effects can change doodles into images or reimagine selfies as playful videos. People can try things like underwater scenes or being given a twin. The AI powering these effects is based on Veo 2, with an upgrade to Veo 3 set to arrive before the end of summer 2025.

Together with these features, YouTube has launched an “AI playground.” This space gives users access to pre-filled prompts, examples, and the ability to generate videos, music, and images instantly. Ali said the playground is already available in the same four regions as Photo to video. She added that each piece of content made with these tools comes with SynthID watermarks and labels to make it clear that it was generated using AI.

 

How Does YouTube Run AI Effects On Phones?

 

One of the biggest shleps for YouTube engineers has been getting complex AI effects to run smoothly on mobile devices. In a blog published in August 2025, Google Cloud’s Andrey Vakunov and YouTube’s Adam Svystun explained how they tackled the problem.

They started with large generative AI models such as StyleGAN2 and later DeepMind’s Imagen. These models could create detailed edits but were too heavy to run in real time. To solve this, the engineers used a technique called knowledge distillation. A large “teacher” model is trained on millions of images and then passes what it has learned to a much smaller “student” model. The student model is compact enough to run directly on a phone, while still carrying the teacher’s ability to create detailed effects.

The training process used carefully curated datasets that represented a mix of ages, genders, and skin tones, measured with the Monk Skin Tone Scale. Engineers added challenges during training such as glasses, hands covering faces or different lighting. This helped the student model deal with real world cases.

 

 

How Do These Effects Keep People’s Faces Accurate?

 

One of the hardest problems in AI video editing is preserving someone’s face so that it still looks like them once the effect has been applied. If this is done poorly, the edited video might change skin tone, distort glasses, or alter clothing. Vakunov and Svystun explained that this issue is known as the “inversion problem.”

To avoid this, the team used a method called pivotal tuning inversion. The process trains a model so it learns the exact details of a person’s face and can then apply changes such as makeup or cartoon styling without changing their identity. This means the final video still looks authentic while carrying the chosen effect.

Once trained, the smaller model is paired with MediaPipe, an open-source framework for machine learning pipelines. MediaPipe detects faces, aligns them, runs the AI effect, and then re-composites the edited image back into the video. This all happens in less than 33 milliseconds per frame, which keeps the video smooth at over 30 frames per second.

 

What Does This Mean For Creators?

 

The technology has already powered more than 20 real time effects on Shorts. These could be things from themed masks like “Risen zombie” to expression tools such as “Never blink” or “Always smile.” Latency on newer phones is now down to around 6 milliseconds on a Pixel 8 Pro and 10.6 milliseconds on an iPhone 13.

Ali described these tools as a way to make Shorts easier and more fun, but also stressed that creators themselves are the real draw. YouTube says the new tools are meant to act as a support for personal creativity rather than replace it.