Anthropic launched Mythos Preview a new AI model that seems capable of detecting security vulnerabilities in software and already found over a 1000 bugs. Kudos to them that they only allow limited access to the model due to its security implications and used responsible disclosure to make sure everyone has time to fix those bugs. As this model would be a formidable red team, keeping lots of people on blue teams up at night it is a long way off from Artificial Intelligence. Another step back from the last step back claiming that AI’s were good at programming. AI as the ever shrinking product.
Programming languages are semi structured languages. They provide a limited syntax and limited number of permutations of that syntax to express commands. That is way more simpler than our natural language which we would have to regard as unstructured. Programming languages have however another big advantage for the current state of LLM’s that natural language is missing: It can be verified. You write some code give it to a compiler or interpreter to make it into instructions a computer can understand. If it compiles without error you have working code. If not, change the code and try again. The loop is tight and can be run only on a computer. Given that computers are able to run this loop much faster than humans it easily looks as a result.
From what reaches us we can clearly see this happening. AI generated code looks clean, compiles and even passes tests. However people are reporting it either does not do anything useful, does not connect with other parts of the system or breaks production due to unforeseen consequences. The code works, but in isolation and leaves us guessing if it provides the requested functionality. Now the pedantic, aka the internet: a) Will tell you that you did not write the correct prompt. Well thank you for that, but what is the right prompt? LLM’s are certainly not smart enough, like people are, to ask follow up questions about their lack of understanding or analyze their output in relation to the functionality they were asked to give. Which leaves trail-and-error and implicitly leaches on our intelligence to determine if the code covers the functionality. That is probably why I see so many funky looking vibe-coded dashboards were the provided information is just wrong. b) Tell you that you did not provide enough context for the LLM to write the correct code. Well again thank you for that, but what is enough context? If I explain the complete workings of a nuclear power control system to a five year old in terms that a five year old can understand she can build me a nuclear power control system. If not, I failed? Five year olds are never wrong. Just ask them.
Programming seems more a way to find a business model for LLM’s than it is a proof of the capabilities of those LLM’s. The feedback loop makes it less likely that they shoot you in the head while trying to shoot themselves in the foot, but it still seems unable to deliver economic value. Enough context or enough reasoning to complete a task will lead to the use of more tokens and thus literally more money being spend. Limiting the reasoning as Claude did recently or limiting available tokens to get to good enough now lets LLM companies determine what that good enough means. It might not be our good enough.
Mythos Preview is just another step back. It is not writing code anymore. It just analyses existing open source code to find our faults. It reads our code, guesstimates where a bug could be, tries to exploit it and loops back if that did not work to try alternatives. At the speed of a computer that can be done pretty efficient. As Anthropic points out it is not cheap and will be even more expensive if you do not have access to the source code and de-assembly or reverse engineering the code is required. It is especially not cheap, because it does find bugs, which is good, but it does not provide any assurance that it found all or even most bugs. There is no relief for the blue team that there are no other bugs someone finds by shear luck or tacit determination. As I am on neither blue or red team a bug found and fixed is always good. If I was on a blue team I would be wondering if Mythos did not uncover bugs no other could ever have found, or that Mythos’ bugs will provide miscreants different attack surfaces to try out tomorrow.
People seem very keen on pointing out in a discussion about AGI that:
- There is something like intelligence
- Given enough computational time with a given function there should be a result that is complete. The Church-Turning thesis.
- The fact that intelligence is something material there must be a function that describes that intelligence.
There is just a couple of problems with this conviction:
- The efficiency problem. Church-Turing does not imply the calculation to be efficient on any time scale. If I throw some amino acids into a sufficiently warm puddle I might be able to create intelligence. I just have to wait a couple of million years. At least for this experiment we know it had one successful outcome.
- Let us accept that the world is deterministic. In the sense that if we know the position, direction and speed of every particle in the universe at a given time t we would be able to predict the position of every particle at time t+1, t+2 .. t+n and thus would be able to completely predict the universe. I am not trying to drag Heisenberg’s uncertainty theorem in here. It is not relevant. If we know the position of every particle at t-1 and the position of every particle at t we could do the same. For those on the religious or philosophical disposition. This neither excludes God nor a free will. Our real problem is that we cannot store that information anywhere or do calculations with it without requiring to keep track of that storage and calculation itself or keep it outside the physical universe we are trying to predict. That is logically impossible. What we do is make models: Simplified versions of reality that help us predict some specific future event. Those models are incomplete and thus wrong, but sometimes useful. Now we strive to make better models which means better predictions, but with every particle we do not take into account our uncertainty about the outcome is growing. I hope one does also see that given a model being a simplified version of reality throwing more data at it won’t do much unless it is the data relevant to our simplification. A calculation is just such a model, so it will necessarily be incomplete. It might still be useful, but given that this calculation does not exist yet, we will have to see.
- A computational function describing intelligence might have a logical problem in itself. Math is a product of our intelligence. Can it ever be used to describe that intelligence itself? It seems less obvious than it appears at first glance. Moreover, if we could compress enough of our intelligence into useful symbols to do math with do we need an AGI? I am on the fence on this one, hence the question marks.
- Even if we managed to create a mathematical function describing intelligence it does not mean we have AGI. My carefully calculation of he trajectory of a ball might be completely correct, but no ball was moved in the process. The other side of the same argument is that we can make calculations that are completely valid as calculations, but far larger than our physical world would allow: The c in E=mc^2 is not easily reached if you are not a photon. A googol is a completely valid number in math, but nothing in the physical world can be counted as such. The mathematical function might simply not fit in our physical universe.
- Nearly forgot, we might not be able to determine such a mathematical function computable because of the Halting probl.
As an afterthought and because some people keep suggesting they are related: LLM’s are not an attempt to find that calculation that describes intelligence in the sense the Church Turing thesis implies. It is a small calculation of a single relation between tokens that we combine between millions of token in het hope that this simulates intelligence. There is no mathematical justification or proof for making those combinations other than probability. Don’t get confused about the fact that probability is expressed in math. High probability is itself not a mathematical proof. The Riemann hypothesis is probably true, but that high probability is not proof in itself.
I seriously doubt that intelligence can be captured in mathematics, but I am sure it cannot be captured in probability. As people start to talk about an artificial general or even super intelligence, the Public Enemy in me is awakened: ‘Don’t believe the Hype’. Think about Kuhn’s ‘The Structure of Scientific Revolutions’ as scientific revolution is something that has to be forced against the ‘common knowledge’. This is already difficult in times of institutionalized science, but it becomes nearly impossible with machine learning models. Your attempt to change the existing consensus will take not some to convince, but all of sublimated history to contradict.
To explain this a bit clearer: Suppose we had LLM’s in 1543 and Copernicus published ‘revolutionibus orbium coelestium’. It seems now obvious that he was right about the movement of the planets and the sun being in the center, but given the fact that his current scientific consensus was that the sun revolved around the earth that is all the LLM could ever acknowledge. Obviously not because it knows anything about the movement of planets, but because it learned from the texts it absorbed that more information was available on the sun resolving around the earth than the single voice of Copernicus. Given that stochastic information it will return the old ideas. Funnily enough this is both true in a quantitative sense: There was a lot more collective knowledge on the sun resolving around the earth up until then. As in a material sense: Copernicus idea was much simpler, better ideas usually are, doing away with a staggering complexity of concentric circles describing the planets movement. Leaving even less data for an AI to process.