Strings in GHC-compilled Haskell require 12 bytes for each character in the string. Even C, supposedly so lean and fast, uses an inherently wasteful representation for strings. I will call all the blame for that (and surely this is an oversimplification) to something i hate not only inside processors and compilers, but also in our social structure and in all kinds of places: representation!
Now, back to Haskell Strings, 12! Yes, that would be 12 times the original string. Now that may seem like a lot (consider my firefox is currently using 120M, how about it using 12 times that?), and obviously there are ways to avoid such wastes, like the ByteString type, but the problem is actually not that localized.
For example, consider that C, this supposedly most economic of languages, does actually store strings in an inherently wasteful manner. Zero-terminated strings, which by the way render the most basic of programs inescapably prone to buffer overflow vulnerabilities — since even for a simple printf() you still need a zero-terminated buffer.
Now let’s examine things. Why the hell does Haskell use 12 bytes for byte? It is because in Haskell a string is actually a list of characters. Beyond the apparent obviousness of this, lies that a list in Haskell is a most wondrous thing: it allows you to transverse it’s content in all sorts of ways, to throw it’s contents into series of linked transforms, it even allows you to have an infinite list! Isn’t it a great thing? Though on the other side, to do all those things, GHC needs a few extra bytes.
Now i am not a purist that thinks that everything should be written in ASM. Well, actually, correct this. I am. I am still using Win2000 because the other win-flavours seem bloated! (Though now that Foobar stopped working in w2k i’m torn between the bloat of the system and the player… oh, cruel world!).
But let me pose the question: why doesn’t GHC simply pretend that lists are simpler and throw away all that extra bits?
Before you have time to ponder a much better answer then my own, it can’t do this because of representation. Strings are used to represent other things, and this means that we use completely different things (strings) in ways that we suppose have similarities to the things we are trying to “represent”. The thing is not there, but we make do. In order to do that, we need to know the properties of strings and use them in clever ways.
That is cool. What’s not cool is that this costs us 11 bytes for each byte of that string. Now, really, if those extra bytes are being used in all those superb magic-like things that Haskell is supposed to do, i’m all for it. But are they? Come on, are they being used in helloworld.hs?
I thought not. So, now, why doesn’t GHC just throw away the extra bytes in hello world? Just hello world? Come on, it should not be so hard to swap simple output calls.
 And, actually, it does. Isn’t GHC gorgeous?
But swapping simple output calls would make no difference — that’s not where 90% of our processors are being wasted. They are probably being wasted in some arcane and complex function where something like 7 out of the 11 extra bytes are being actively used. But seriously, the complexity should rarely use all of the added complicatedness.
And the reason you can’t discard the useless complicatedness is that you are representing things.
Now you might think: that does not matter, this is just one of those stupid things about high-level languages, they just can’t really be bought to compare to actual C code, but wait! Didn’t we say C stores strings in a useless way?
Imagine, for example, this:
for (a=string, i=65; i<133; i++) *a++=i;
I’m a bad C coder, so probably the above is pure gibberish. But i intended to produce a string containing ABC..XYZ. But the compiler can’t know this, for either he would need AI enough to understand what i meant with the code (which i myself am not sure i do), or he would have to make assumptions about what i want to represent.
 Not only i do have here the string (called “string” occupying 100 bytes of memory, but actually holding at most 99 characters), i also have an “thing” that is a pointer and i am dereferencing it, and after filling my string i have to fill the zero-termination for the string. Is the pointer actually needed? Can it be abstracted out of existence? Our “simpler” C language actually has one byte and one “construct” (derreferencing) of added overhead. And the problem is not the overhead per se, as we are dealing with very very cheap examples here, but whether this overhead ca prevent further automatic machine abstraction. What the compiler can discard here? Surely, the more it can discard, the better the optimization can get. But i fear that he can discard little — unless he knows what is being represented.
And if he assumes too much about what i want to represent, he will probably break my code. For example, can it assume that the i integer is merely a counter and therefore never actually store it, having it as a transitory register that never even receives a particular name? No, it can’t do that. Why? Because maybe the i integer is representing something.
So my proposal is to impose the LAW OF MINIMUM PROVABLE REPRESENTATION to algorithm parsing, whereby it stands that the compiler will always assume data to represent as few things as it can definitely prove it needs to.
So, for example, an Haskell string will not be actually represented with 12 byte/byte except when it is being actually transversed in both forward and backward directions. If it is transversed in only one direction, scrap away with the byte pointing to the character in the other direction. And so on and so forth. If the string only receives basic string-handling, represent it in ASCIZ. If not even handled it is, use Pascal Strings and, in case the string is only being spat to I/O allocate the bare characters, without zero termination or initial length record!
In many ways compilers already try to expel excess complicatedness. What i am proposing here is an even more radical approach.
 The difference is that optimization is currently based on avoiding unneeded work. While avoiding extra trouble is a good thing, the issue can be dealt from another angle: scrap representation. Conversely, this turns optimization from an additive issue (where you are always having to extract that extra bit of speed) into an deterministic issue (where differences will come only from changing the perspective of the problem and not through boring blind chopping at complicatedness). If a simple and correct algorithm can be devised to determine the minimum representation proved-necessary one could automatically determine best-case optimizations for all constructs. It would be the same as a ambiguity-free language.
I realize this shall be much easier to say than to do. And I also realize this would break lots and lots of code. So maybe the language has to be designed (or at least the compiler) from the ground up to follow those rules.
My hopes are that, if this can really be accomplished, you can reduce the fat-gain of each abstraction level to almost 0. That means you could use OO or functional programming (or even that overconfident mix of everything and the kitchen sink that Python purportedly does) and still be as lean as C. Leaner, even.
If i could prove this i would be rich or arrested by the CIA, by the way…