OpenAI Forms Team to Control “Superintelligent” AI Systems

OpenAI, the leading artificial intelligence (AI) research organisation, is taking a proactive approach to address the potential risks associated with “superintelligent” AI systems. In a recent blog post, Ilya Sutskever, OpenAI’s chief scientist and co-founder, and Jan Leike, a lead on the alignment team, highlighted the need for research and development in controlling and steering advanced AI that surpasses human intelligence.


Anticipating the Arrival of Superintelligent AI


Sutskever and Leike expressed their belief that AI systems with intelligence surpassing that of humans may become a reality within the next decade. They emphasised the importance of preparing for such an eventuality as these superintelligent AI systems might not inherently possess benevolence or align with human values.

To ensure the safe and responsible development of AI, OpenAI recognises the need for robust strategies to control and restrict potentially rogue superintelligent systems.


Addressing the Challenge of Steering Superintelligent AI


The newly formed Superalignment team, led by Ilya Sutskever and Jan Leike, will dedicate their efforts to advancing the field of “superintelligence alignment.” This team will have access to a significant portion of OpenAI’s computational resources, approximately 20% of the company’s existing compute capacity. By bringing together researchers and engineers from OpenAI’s alignment division and collaborating with experts from various other organisations, the team aims to tackle the core technical obstacles associated with controlling superintelligent AI within the next four years.


Building a Human-Level Automated Alignment Researcher


To achieve their objectives, Sutskever and Leike propose the development of a “human-level automated alignment researcher.” The overarching goal is to leverage AI systems to assist in training other AI systems, enabling them to evaluate and understand alignment challenges. By utilising human feedback, the team aims to train AI systems that can conduct alignment research, ensuring that AI achieves desired outcomes and remains within acceptable boundaries.



The Hypothesis of AI Advancement in Alignment Research

OpenAI’s hypothesis is that AI can make faster progress in alignment research compared to humans. Sutskever, Leike, and their colleagues, John Schulman and Jeffrey Wu, believe that AI systems, working in collaboration with human researchers, can conceive, implement, study, and develop more effective alignment techniques. This symbiotic relationship will allow human researchers to focus on reviewing AI-generated alignment research, rather than generating it themselves.


Acknowledging Limitations and Challenges


While OpenAI is optimistic about the potential of AI in alignment research, the team acknowledges the inherent risks and limitations involved. They caution that utilising AI for evaluation purposes may amplify inconsistencies, biases, and vulnerabilities within the AI itself. Additionally, they recognise that the most challenging aspects of the alignment problem may extend beyond the realm of engineering.


A Collective Effort for the Greater Good


Despite the obstacles, Sutskever and Leike believe that the pursuit of superintelligence alignment is crucial. They emphasise the need for machine learning experts, both within and outside of OpenAI, to contribute their expertise in solving this critical challenge. OpenAI’s commitment extends beyond its own models, as the organisation aims to share its findings widely and actively contribute to the alignment and safety of non-OpenAI AI systems.


Looking Ahead


OpenAI’s formation of the Superalignment team demonstrates its commitment to proactively addressing the potential risks associated with superintelligent AI systems. By leveraging the power of AI itself, OpenAI seeks to develop novel strategies to ensure the alignment and control of advanced AI. As the era of superintelligent AI approaches, the work of this team will play a pivotal role in shaping the safe and responsible development of AI technologies for the benefit of humanity.