OpenAI has been working on tools that help decipher where images and text content comes from- specifically, whether or not it is AI generated. This includes introducing watermarking and metadata techniques to test content authenticity.
They uploaded a case study that reviews their efforts to verify the authenticity of media created by AI. This project looks at developing methods to trace the origin of images and texts, termed media provenance.
What Does The Research Involve?
The research includes a classifier designed to accurately identify images generated by their DALL·E model, effective even when images are altered, such as through cropping or compression.
The case study also looks at the ethical aspects of these technologies, especially their effects on different groups. For example, watermarking AI-generated texts could disadvantage non-native English speakers as they use the tool to improve text.
OpenAI is collaborating with industry partners and policymakers to develop standardised, responsible practices for using these technologies. They plan to educate both the public and policymakers about the strengths and limits of provenance technologies to set realistic expectations about their utility.
On the watermarking, OpenAI said, “While it has been highly accurate and even effective against localised tampering, such as paraphrasing, it is less robust against globalised tampering; like using translation systems, rewording with another generative model, or asking the model to insert a special character in between every word and then deleting that character – making it trivial to circumvention by bad actors.”
How Does The Metadata Work?
Although in its early stages still, OpenAI is looking at how they can use metadata as a provenance method. OpenAI uses metadata within the scope of the Coalition for Content Provenance and Authenticity to embed detailed information about the origins and edits of digital content. Metadata embedded by OpenAI includes:
Tool Identification: This metadata records which AI tools were used in the creation of the content.
More from News
- Retail Cyber Attacks: Cartier And North Face Are The Next Retailers Affected
- A Look At The Different Technologies Volvo Is Bringing To Its Cars
- Klarna Launches Debit Card To Diversify Away From BNPL
- T-Mobile Now Has Fibre Internet Plans Available For Homes
- Bitdefender Finds 84% of Attacks Use Built In Windows Tools, Here’s How
- Japan Starts Clinical Trials For Artificial Blood Which Is Compatible With All Blood Types
- UK Unicorn Monzo Breaks £1 Billion in Revenue
- Where Is Meta Replacing Humans With AI, And What Are The Risks?
Editing History: Modifications made to the content, such as edits within OpenAI products or external tools post-generation, are logged.
Digital Signatures: To secure the authenticity of the metadata, digital signatures are used. These signatures help detect unauthorised changes to the file, to preserve the integrity of the metadata.
Practical Usage
Whenever content is generated or modified through OpenAI’s tools, metadata is automatically attached. So an example, if a user modifies an AI-generated image in ChatGPT, this action and the tools used are logged in the metadata.
This makes metadata a useful tool for platforms and consumers who want to understand the provenance of the content they consume or share. It’s especially useful in media, journalism, and online content platforms where authenticity matters.
Watermarking in Practice: How It Works
The watermarking process works by embedding a pattern or marker into the content that is imperceptible during normal use but can be detected algorithmically. OpenAI uses watermarking to tag content subtly, indicating that it was generated by an AI model.
The watermark is typically a pattern or code inserted into the content (audio, video, or images) in a way that does not visibly alter it. For images, this could be a slight alteration in pixel values that are imperceptible to the human eye but detectable by software.
Specialised software or tools can detect these watermarks, confirming that the content was generated by a particular AI model or tool. This helps with identifying AI-generated content that may be passed off as human-generated.
Watermarking is beneficial in academic settings where essays or artworks generated by AI tools like ChatGPT need to be identified to prevent plagiarism. In legal scenarios, watermarking can help in authenticating evidence or documents to see that they have not been unduly altered.
What Do Experts Think?
Gregor Hofer, CEO of Rapport and Speech Graphics said, “OpenAI’s development of a watermarking tool for AI-generated content is a significant step towards transparency in the age of artificial intelligence. This technology will be crucial for maintaining trust and integrity across various industries, particularly in creative fields.
“For workers and students, this tool will help establish clear boundaries between human-created and AI-generated content. It’s essential that we cultivate an environment where AI is seen as a powerful assistant rather than a replacement for human creativity and critical thinking.
“In the creative industry, this watermarking technology could actually boost innovation. By clearly distinguishing AI-generated content, we create space for human creativity to shine while also leveraging AI as a tool for inspiration and efficiency.
“Ultimately, the key is to make interactions with AI and AI-generated content clear and obvious. This transparency will allow us to harness the full potential of AI while preserving the unique value of human input and creativity. As we navigate this new landscape, it’s crucial that we develop ethical frameworks and best practices for AI use across all sectors.”