The original sin of computing - what if compilers were compromised?

OVERKILL

$100 Site Donor 2021
Joined
Apr 28, 2008
Messages
63,139
Location
Ontario, Canada


This is a fantastic video about how everything in computing is derivative. So, if there was a vulnerability in one of the early compilers, this would be replicated in all child binaries and compilers. Effectively, unchecked and invisible proliferation. And in fact this did happen at Bell Labs.

While not directly related, it made me think about LLM's and how when LLM's are allowed to learn from other LLM's the end result is ultimately garbage, because they are unable to weed out the "unclean" (corrupt) code from the original (clean) source material, so you get iterative corruption that ultimately self-replicates until it consumes the model.
 
And in fact this did happen at Bell Labs.
What I read was that Ken Thompson discussed the issue and did it to prove his theory but it didn't actually happen. Is there more to this ? What he discussed was intriguing too - modify the code of a compiler to include a vulnerability, then compile it, and the vulnerability is embedded inside with no traces of it's existence.

On a related note: How was the very first compiler code compiled ? Or was the first compiler written in assembly ?
 
What I read was that Ken Thompson discussed the issue and did it to prove his theory but it didn't actually happen. Is there more to this ? What he discussed was intriguing too - modify the code of a compiler to include a vulnerability, then compile it, and the vulnerability is embedded inside with no traces of it's existence.

On a related note: How was the very first compiler code compiled ? Or was the first compiler written in assembly ?
This is all covered in the video. And yes, Ken actually did release a compromised compiler at Bell Labs.
 


This is a fantastic video about how everything in computing is derivative. So, if there was a vulnerability in one of the early compilers, this would be replicated in all child binaries and compilers. Effectively, unchecked and invisible proliferation. And in fact this did happen at Bell Labs.

While not directly related, it made me think about LLM's and how when LLM's are allowed to learn from other LLM's the end result is ultimately garbage, because they are unable to weed out the "unclean" (corrupt) code from the original (clean) source material, so you get iterative corruption that ultimately self-replicates until it consumes the model.

Sort of like when an American company put well known bugs in software to only have an Indian or Chinese company copy it down to the bug?
 
This is all covered in the video. And yes, Ken actually did release a compromised compiler at Bell Labs.
Video is over 22 minutes long 😂 What I read was yes, he did it, but only to prove it works. It's never happened in the wild (or has it ? 🤔).
 
If you think about it, AI's have the largest attack surface of any application, due to the fact that they are trained on anything and everything. Specially crafted documents, images, or any digital media actually, can theoretically be created to "poison" an AI that crawls them.
 
We were asked about that in a computer security class back in 1999. In theory yes it is possible, especially if it is a closed source compiler. However the compiler would have to only build something when they know what you are trying to build and only inject a backdoor when you are using it as intended. The chances of something getting caught because of a mistake is much higher than when they only inject the backdoor exactly where you want, nothing more and nothing less. I typically worry more about a library having backdoor than a compiler, or linker. The biggest risk is still in the source code though, people who intentionally inject something via source code is a much bigger risk.

In the end you have to start your trust somewhere, and beyond that you have to based it off someone else you trust and keep going until you trust the math and logics you learn in school to be correct.

When AI starts selling advertisements they will have the intention to steer you somewhere and the creditability start going down the drain. I don't know why people give AI such creditability today. I have to tell my parents to stop believing everything AI said.
 
Last edited:
If you think about it, AI's have the largest attack surface of any application, due to the fact that they are trained on anything and everything. Specially crafted documents, images, or any digital media actually, can theoretically be created to "poison" an AI that crawls them.
AI is based on statistics, and if you can steer human opinions by hearsay, you can steer the AI's opinion. This is why you still need peer review on scientific papers to avoid hearsay like a religious miracle.

I think the correct term for that instead of poison is marketing and campaigning in human world, not sure the term for AI.
 
AI is based on statistics, and if you can steer human opinions by hearsay, you can steer the AI's opinion. This is why you still need peer review on scientific papers to avoid hearsay like a religious miracle.

I think the correct term for that instead of poison is marketing and campaigning in human world, not sure the term for AI.
An interesting read.

https://www.anthropic.com/research/small-samples-poison
 
AI is based on statistics, and if you can steer human opinions by hearsay, you can steer the AI's opinion. This is why you still need peer review on scientific papers to avoid hearsay like a religious miracle.

I think the correct term for that instead of poison is marketing and campaigning in human world, not sure the term for AI.
I'm working on an AI object detection project at work with an actual AI company, and it amazes me how even just one or two badly-named objects (read: a tennis shoe gets annotated as a dress shoe, for simplicity's sake) can immediately take the model's confidence down by 20% or more, even when talking about object libraries of several thousand objects per "class".
 
Back
Top Bottom