The Inherent Challenges of AI Safety
AI has the potential to cause great harm to humanity if it isn’t controlled carefully, both in the form of software such as large language models, but also when it comes to applications of AI in robotics like autonomous vehicles and humanoid robots.
Despite extensive safety restrictions currently placed on AI models, users frequently find ways to circumvent these protections through various jailbreaking and prompt engineering techniques. However, even if companies theoretically could completely prevent certain harmful requests, doing so would inevitably limit the AI’s usefulness.
The Security Dilemma
Consider a software developer using AI to search for vulnerabilities in their product’s code to prevent exploitation. How can a large language model distinguish this legitimate request from that of a malicious hacker looking for vulnerabilities to exploit? To ensure no malicious code is created, the system would have to deny the developer’s legitimate request as well.
This same challenge also extends to other areas like creative writing. If someone wants to write a book containing information that could be potentially dangerous like building a bomb or breaking into a house, these requests must be denied because the AI cannot differentiate between fictional storytelling and actual harmful intent.
Robot Safety
The same principles apply to robotics such as humanoid robots. Somehow programming them to be incapable of harming humans might prevent misuse, but would also make them unable to defend their owner from an intruder. Similarly, preventing robots from handling potentially dangerous tools like knives and other useful tools would limit their ability to perform basic tasks such as cooking and home maintenance.
The Unpredictable Nature of Machine Learning
Beyond preventing harmful requests, there’s another fundamental challenge in making AI and robots safe. Unlike traditional programming, machine learning outcomes are often unpredictable because their algorithms are too complex for humans to fully comprehend. The number of variables in real-world scenarios makes it impossible to predict every possible action.
For instance, a self-driving car might cause an accident while trying to achieve other programmed objectives, such as reaching its destination quickly or avoiding obstacles. While more training data can reduce these occurrences and improve safety, completely preventing unpredictable events may be impossible.
Conclusion
While humans also make unpredictable mistakes, the challenge of creating completely safe AI systems that can freely interact with society remains significant. As AI capabilities continue to grow, finding the right balance between safety restrictions and practical functionality will become an integral question.