A Chat with Jesse Shemen, CEO & Co-Founder at AI Video Translation Software: Papercup

Most of the world’s video and audio content is shackled to a single language. That’s billions of hours of videos on YouTube, millions of podcasts episodes, hundreds of thousands of corporate training videos, tens of thousands of classes on Coursera, and thousands of hours of content on streaming sites., and thousands of hours of content on streaming sites. Content owners are scrambling to go international, yet there is no simple and cost-effective way to translate content beyond subtitling. This is what we set out to tackle with Papercup.

We enable companies to scale high-quality dubbing through our speech translation technology. The pioneering tech at the heart of the company translates voices into other languages with output sound that is indistinguishable from human speech and reflects the characteristics of the original speaker. The speech is then fed into a human-in-the-loop editing interface (think Unbabel or Verbit) which allows for the output to be revised or customized based on a customer’s specifications.

The ultimate aim with our technology is simple. We want to make all voice-based content, whether a TED lecture or live news coverage of the Olympics, consumable in any language. We’re currently doing this for Sky News, Business Insider, BBC, Red Hat, and Yoga with Adriene, improving their global reach.
Papercup - Crunchbase Company Profile & Funding

How did you come up with the idea for the company?

It’s easy to overlook the sheer amount of content we have access to. People are watching five hours of video every day. Every minute, 500 hours of content is uploaded to YouTube. The video landscape is continuously changing and growing, but the majority of this video content is still locked in one language, making it unavailable to audiences around the world to listen to. Jiameng and I saw this as a huge opportunity in the video content creation space.

For the current six billion non-English speakers around the world, the video watching experience is less than ideal and when views are integral to success, this presents a key challenge for content owners across the board. Yes, subtitles are great, people want to watch content in their own languages. We built Papercup to address this. Our automated technology translates voices into a variety of languages to increase accessibility for audiences no matter what language they happen to speak.


How has the company evolved during the pandemic?

As with many digital-first industries, the pandemic hasn’t completely disrupted the way we operate or what we focus on. What we have sensed is a material change in the market. Within the first 12 months of the pandemic, companies were vying for the attention of audiences online and so were forced to consider international markets to maintain or continue growth – both on the media content side and from the perspective of the enterprise.

For media, it was a way to continue to grow audiences without having to necessarily invest in creating totally new content for those markets. For enterprises – it was a way of ensuring that internal teams, as well as customers, remained engaged with the company despite a lack of physical contact. For smart companies, the adoption of automation was both necessary and economical during this period.

What can we hope to see from Papercup in the future?

Our mission from the start has been to make the world’s videos watchable in any language. We’re achieving this in two distinct ways. Firstly, fundamental research in machine learning improves the expressivity and naturalness of our synthetic voices while we roll out new languages. Secondly, we’re educating and informing a whole new market that has never thought about localization or dubbing before as historically the price point was prohibitive. As we progress on these two fronts, we aim to tackle more territories, languages and content types to localize and make video content available to a global audience.

Down the line, we think we’ll be able to extend our technology to any form of human dialogue – allowing any two people to speak with one another regardless of what language they happen to speak. In other words – your voice, in another language.