I have a thought. This is dangerous, I know. Here's my thought:
A programming language where optimizations are feedback. In other words, a programming language where optimizations are explicit, and opt-in. With the compiler (or even runtime feedback) suggesting optimizations. Not implementing them behind your back, but merely suggesting them.
Why? In most programming languages, there's an interesting dilemma. Mainly, that a) you don't know what is actually running, b) that you cannot rely on optimizations, and c) sometimes optimizations are in fact pessimizations.
So, I suggest a programming language where optimizations are explicitly opted into and out of (hierarchically, with most specific scope overriding) by annotations (or something similar), either as a requirement or a suggestion. (For something like TCO, for instance, you may wish to require it. Whereas you may wish to only suggest that a loop be unrolled.)
Does it mean more typing? Does it mean that it takes more time to code? Not really - if that's really a concern just enable everything as a suggestion for the entire project. But most of the time you shouldn't do that, or do it sparingly.
But the advantage of this is that everything is explicit. You know what the compiler is actually doing, as opposed to what you hope you are thinking that you are trying to get the compiler to do (or not do!).
You can actually tell the compiler that no, in fact, that variable that's about to be freed must be zeroed first. Or that unrolling that loop is something that should be done regardless of if it looks good on the surface.
(This was all spurred by me trying to figure out if it's in fact possible to securely zero an array in portable C, and coming to the conclusion that you cannot actually do so. The compiler can and will optimize out things. And even things that it doesn't optimize now, it is allowed to optimize out later.)
Sort: Top
[–] rdnetto 0 points 1 point 1 point (+1|-0) ago
The problem I see with this is that it's pretty much the opposite of "make the common case fast/easy". It's basically equivalent to using assembly, or maybe C with
-O0, and would be extremely verbose for the case where you wanted to enable all but one optimization.I think doing it the opposite way around would make more sense - there should be a special qualifier for variables that disables optimizations for them, and maybe a block structure that disables optimizations inside it as well. You could even parameterize that qualifier/block in the optimizations it would disallow.
[–] NotSurvivingLife [S] 0 points 1 point 1 point (+1|-0) ago (edited ago)
I agree with making the common case easy.
Unfortunately, I don't see any way of doing so that doesn't miss the point entirely, your thought included.
The problem is when people introduce additional optimizations down the line. (And new optimizations will be added.) There's no way of knowing in advance what new optimizations are safe and what are unsafe. Alas, this is exactly the problem we are trying to solve - namely writing something that is safe not only now but will always be safe given the language standard remains.
You could have optimizations tied to language versions - but then you have optimizations hinging on the language standard, which isn't exactly optimal.
[–] rdnetto 0 points 1 point 1 point (+1|-0) ago
That's a good point. I think restricting it based on language version is probably the only sane solution. Large projects normally make adoption of new versions of the language explicit anyway, so when you bump the version you could at least test for those kinds of regressions.
Agreed. The problem we have is that developers cannot predict what the compiler will do, because they do not understand how it works. You proposed making everything explicit, which would effectively simplify the compiler to the extent that it was a simple translation layer rather than an optimizer (in other words, an assembler). I proposed blacklisting instead of whitelisting, but that merely empowers the user who understands the gotchas of the compiler without making them more visible.
The underlying issue is one of semantic gap - the difference between how the human and compiler interpret the code. e.g. the compiler thinks that variables are only used to store data for consumption within the program, while the human intends them to be used externally or wiped securely. The only way to close that gap is to make the language more complex/sophisticated and to force the user to annotate the code appropriately. e.g. Python uses
TemporaryFileandNamedTemporaryFileto distinguish between the cases where external access to a file is needed, and Rust uses owned/borrowed pointers to enforce memory safety. The problems with this approach are twofold: many developers find such languages constrictive (consider how many people struggle with type errors in Haskell), and the complexity of the language is limited by the power of the compiler and (to a lesser extent) the ability of the developers. Despite this, I think it's probably the best approach we have atm.[–] HentaiOjisan 0 points 1 point 1 point (+1|-0) ago
First time I hear about that. Doesn't something like memset zero an array always? Or calloc instead of malloc, or just a for loop? You can use volatile if your loop is getting optimized out:
I don't understand the problem. As far as I know optimizations occur mostly at the assembly and branching level, deleting unnecessary parts of the code, and, apart from some loops that you usually really want but the compiler skips them, I have never felt that optimizations broke my code at all. And those loops were usually used for timing in microcontrollers, and adding a simple volatile fixed it for me.
I'm not an computer engineer so I might be wrong tho.
[–] NotSurvivingLife [S] 0 points 2 points 2 points (+2|-0) ago
You are wonderfully naive on that front. Compilers are evil, full stop.
(First off, slight confusion. Volatile only works if it is applied to the original array, where I am talking about "here's an array, can you securely memset it in portable C / C++". There is a distinct difference.)
And volatile doesn't work anyways, as the compiler is allowed to make copies of variables behind-the-scenes and not zero them out. The most obvious example of this is stack on some architectures, but there are other examples also.
And no, memset does not work. The compiler is allowed to - and will - optimize out a call to memset (or equivalent) if the array is not read from afterwards. And even if the array is read from afterwords the compiler will often just optimize out the memset and propagate through the value written directly.
And there are any number of optimizations - not just these ones - compilers make that are actively dangerous for security purposes. From the sounds of it you are talking about code that's not designed to be secure, which is all very well and good in and of itself.
The problem with C / C++ is that their memory models are a subset of the memory models of the underlying hardware.
[–] HentaiOjisan 0 points 1 point 1 point (+1|-0) ago
Hmm you picked my interest. And thanks for the explanation, I still have a lot to learn about compilers.
So in what situation would an array, that the compiler feels it's not going to be read, actually be read and needed to be zeroed? I can only think in having that array in a hard-coded address and being read from that address instead of the variable itself. And could you point out an example of a code that it's insecure because the compiler omits some part of the code if optimizations are active?
I'm not being sarcastic or anything like that, I really want to know what confuses the compiler in which situations. Could you link some kind of documentation about it? Damn, I'm kinda sad that I didn't choose computer engineering when I started in the university.