#4. AI's are good at programming?
Anthropic launched Mythos Preview a new AI model that seems capable of detecting security vulnerabilities in software and already found over a 1000 bugs. Kudos to them that they only allow limited access to the model due to its security implications and used responsible disclosure to make sure everyone has time to fix those bugs. As this model would be a formidable red team, keeping lots of people on blue teams up at night it is a long way off from Artificial Intelligence. Another step back from the last step back claiming that AI’s were good at programming. AI as the ever shrinking product.
Programming languages are semi structured languages. They provide a limited syntax and limited number of permutations of that syntax to express commands. That is way more simpler than our natural language which we would have to regard as unstructured. Programming languages have however another big advantage for the current state of LLM’s that natural language is missing: It can be verified. You write some code give it to a compiler or interpreter to make it into instructions a computer can understand. If it compiles without error you have working code. If not, change the code and try again. The loop is tight and can be run only on a computer. Given that computers are able to run this loop much faster than humans it easily looks as a result.
From what reaches us we can clearly see this happening. AI generated code looks clean, compiles and even passes tests. However people are reporting it either does not do anything useful, does not connect with other parts of the system or breaks production due to unforeseen consequences. The code works, but in isolation and leaves us guessing if it provides the requested functionality. Now the pedantic, aka the internet: a) Will tell you that you did not write the correct prompt. Well thank you for that, but what is the right prompt? LLM’s are certainly not smart enough, like people are, to ask follow up questions about their lack of understanding or analyze their output in relation to the functionality they were asked to give. Which leaves trail-and-error and implicitly leaches on our intelligence to determine if the code covers the functionality. That is probably why I see so many funky looking vibe-coded dashboards were the provided information is just wrong. b) Tell you that you did not provide enough context for the LLM to write the correct code. Well again thank you for that, but what is enough context? If I explain the complete workings of a nuclear power control system to a five year old in terms that a five year old can understand she can build me a nuclear power control system. If not, I failed? Five year olds are never wrong. Just ask them.
Programming seems more a way to find a business model for LLM’s than it is a proof of the capabilities of those LLM’s. The feedback loop makes it less likely that they shoot you in the head while trying to shoot themselves in the foot, but it still seems unable to deliver economic value. Enough context or enough reasoning to complete a task will lead to the use of more tokens and thus literally more money being spend. Limiting the reasoning as Claude did recently or limiting available tokens to get to good enough now lets LLM companies determine what that good enough means. It might not be our good enough.
Mythos Preview is just another step back. It is not writing code anymore. It just analyses existing open source code to find our faults. It reads our code, guesstimates where a bug could be, tries to exploit it and loops back if that did not work to try alternatives. At the speed of a computer that can be done pretty efficient. As Anthropic points out it is not cheap and will be even more expensive if you do not have access to the source code and de-assembly or reverse engineering the code is required. It is especially not cheap, because it does find bugs, which is good, but it does not provide any assurance that it found all or even most bugs. There is no relief for the blue team that there are no other bugs someone finds by shear luck or tacit determination. As I am on neither blue or red team a bug found and fixed is always good. If I was on a blue team I would be wondering if Mythos did not uncover bugs no other could ever have found, or that Mythos’ bugs will provide miscreants different attack surfaces to try out tomorrow.