The original sin of computing - what if compilers were compromised?

OVERKILL · Nov 9, 2025

This is a fantastic video about how everything in computing is derivative. So, if there was a vulnerability in one of the early compilers, this would be replicated in all child binaries and compilers. Effectively, unchecked and invisible proliferation. And in fact this did happen at Bell Labs.

While not directly related, it made me think about LLM's and how when LLM's are allowed to learn from other LLM's the end result is ultimately garbage, because they are unable to weed out the "unclean" (corrupt) code from the original (clean) source material, so you get iterative corruption that ultimately self-replicates until it consumes the model.

Hall · Nov 9, 2025

OVERKILL said:
And in fact this did happen at Bell Labs.

What I read was that Ken Thompson discussed the issue and did it to prove his theory but it didn't actually happen. Is there more to this ? What he discussed was intriguing too - modify the code of a compiler to include a vulnerability, then compile it, and the vulnerability is embedded inside with no traces of it's existence.

On a related note: How was the very first compiler code compiled ? Or was the first compiler written in assembly ?

OVERKILL · Nov 9, 2025

Hall said:
What I read was that Ken Thompson discussed the issue and did it to prove his theory but it didn't actually happen. Is there more to this ? What he discussed was intriguing too - modify the code of a compiler to include a vulnerability, then compile it, and the vulnerability is embedded inside with no traces of it's existence.

On a related note: How was the very first compiler code compiled ? Or was the first compiler written in assembly ?

This is all covered in the video. And yes, Ken actually did release a compromised compiler at Bell Labs.

97prizm · Nov 9, 2025

OVERKILL said:
This is a fantastic video about how everything in computing is derivative. So, if there was a vulnerability in one of the early compilers, this would be replicated in all child binaries and compilers. Effectively, unchecked and invisible proliferation. And in fact this did happen at Bell Labs.

While not directly related, it made me think about LLM's and how when LLM's are allowed to learn from other LLM's the end result is ultimately garbage, because they are unable to weed out the "unclean" (corrupt) code from the original (clean) source material, so you get iterative corruption that ultimately self-replicates until it consumes the model.

Sort of like when an American company put well known bugs in software to only have an Indian or Chinese company copy it down to the bug?

Hall · Nov 9, 2025

OVERKILL said:
This is all covered in the video. And yes, Ken actually did release a compromised compiler at Bell Labs.

Video is over 22 minutes long

What I read was yes, he did it, but only to prove it works. It's never happened in the wild (or has it ?

).

ripcord · Nov 9, 2025

If you think about it, AI's have the largest attack surface of any application, due to the fact that they are trained on anything and everything. Specially crafted documents, images, or any digital media actually, can theoretically be created to "poison" an AI that crawls them.

OVERKILL · Nov 9, 2025

Hall said:
Video is over 22 minutes long What I read was yes, he did it, but only to prove it works. It's never happened in the wild (or has it ? ).

Well yes, but it sounds like you need to watch it still

PandaBear · Nov 10, 2025

We were asked about that in a computer security class back in 1999. In theory yes it is possible, especially if it is a closed source compiler. However the compiler would have to only build something when they know what you are trying to build and only inject a backdoor when you are using it as intended. The chances of something getting caught because of a mistake is much higher than when they only inject the backdoor exactly where you want, nothing more and nothing less. I typically worry more about a library having backdoor than a compiler, or linker. The biggest risk is still in the source code though, people who intentionally inject something via source code is a much bigger risk.

In the end you have to start your trust somewhere, and beyond that you have to based it off someone else you trust and keep going until you trust the math and logics you learn in school to be correct.

When AI starts selling advertisements they will have the intention to steer you somewhere and the creditability start going down the drain. I don't know why people give AI such creditability today. I have to tell my parents to stop believing everything AI said.

PandaBear · Nov 10, 2025

ripcord said:
If you think about it, AI's have the largest attack surface of any application, due to the fact that they are trained on anything and everything. Specially crafted documents, images, or any digital media actually, can theoretically be created to "poison" an AI that crawls them.

AI is based on statistics, and if you can steer human opinions by hearsay, you can steer the AI's opinion. This is why you still need peer review on scientific papers to avoid hearsay like a religious miracle.

I think the correct term for that instead of poison is marketing and campaigning in human world, not sure the term for AI.

ripcord · Nov 10, 2025

PandaBear said:
AI is based on statistics, and if you can steer human opinions by hearsay, you can steer the AI's opinion. This is why you still need peer review on scientific papers to avoid hearsay like a religious miracle.

I think the correct term for that instead of poison is marketing and campaigning in human world, not sure the term for AI.

An interesting read.

https://www.anthropic.com/research/small-samples-poison

PandaBear · Nov 10, 2025

ripcord said:
An interesting read.

https://www.anthropic.com/research/small-samples-poison

Yes, they have to deal with that all the time, like when someone sue them for allowing copy righted work into the model and the court tell them to 'untrain' them starting 5 years back.

SubieRubyRoo · Nov 10, 2025

PandaBear said:
AI is based on statistics, and if you can steer human opinions by hearsay, you can steer the AI's opinion. This is why you still need peer review on scientific papers to avoid hearsay like a religious miracle.

I think the correct term for that instead of poison is marketing and campaigning in human world, not sure the term for AI.

I'm working on an AI object detection project at work with an actual AI company, and it amazes me how even just one or two badly-named objects (read: a tennis shoe gets annotated as a dress shoe, for simplicity's sake) can immediately take the model's confidence down by 20% or more, even when talking about object libraries of several thousand objects per "class".

The original sin of computing - what if compilers were compromised?

OVERKILL

$100 Site Donor 2021

Hall

OVERKILL

$100 Site Donor 2021

97prizm

Hall

ripcord

OVERKILL

$100 Site Donor 2021

PandaBear

PandaBear

ripcord

PandaBear

SubieRubyRoo