?

Log in

No account? Create an account
Cyclone: A safer C variant. - Adventures in Engineering — LiveJournal
The wanderings of a modern ronin.

Ben Cantrick
  Date: 2006-06-21 08:23
  Subject:   Cyclone: A safer C variant.
Public
  Music:_The Pragmatic Programmer_, by A. Hunt and D. Thomas

C is not going away.

Although newer languages (Java, C#, Perl, PHP, Python, Ruby, ...) are eclipsing C in usage and vitality, C is not going away. In fact, C programs will comprise the core of our computing environment for the foreseeable future. 95% of the code running on your desktop today (2006) is written in C, or its relative, C++ (e.g., the OS, your mail client and web browser). There are a number of reasons for this, but they all boil down to this: C gives programmers control over resources.

Cyclone is a language for C programmers who want to write secure, robust programs. It’s a dialect of C designed to be safe: free of crashes, buffer overflows, format string attacks, and so on. Careful C programmers can produce safe C programs, but, in practice, many C programs are unsafe. Our goal is to make all Cyclone programs safe, regardless of how carefully they were written. Cyclone "feels" like programming in C: Cyclone tries to give programmers the same control over data representations, memory management, and performance that C has.


http://cyclone.thelanguage.org/wiki/Why%20Cyclone


If you know me well, you probably know that I'm not a huge fan of C++. Though made with the best of intentions and containing many good concepts, C++ just has too many warts to be a good successor. Java does what C++ should have done, but much more cleanly and maintainably. In fact, the problem with C++ is not that it does too little, but that it does far too much. As I quoted someone several years ago:

"If you must use the wrong language for the job, I'd rather see you use C than C++. It's true that C gives you enough rope to hang yourself. But so does C++, and it also comes with a premade gallows and a book on knot tying." "

Cyclone looks very interesting. I particularly like the fact that if you want to, you can tell the compiler to auto-check your pointers for NULL, or for bounds. In fact I think I'd probably start out making all my pointers Cyclone "?"-style (bounds and NULL checked) and only change them if a profiler run suggested that a certain part of my code was slow. Programmers generally never, ever have any clue about what part of their code is slow. The parts we think are slow almost never are. Without running a profiler, you are absolutely shooting in the dark - at something that probably isn't even in the same room with you.

Even in the rare case when you guess correctly about where your code is slow, the best solution is almost never to make small tweaks to pointer arithmetic. I believe I once read that spending months recoding your entire program in hand-optimized assembly would yield, at most, a 30% speedup. On the average, more like 5-15%. On the other hand, going back to your Knuth books and selecting a better algorithm... will often speed your code up 150-300%.

And even then, if you don't profile your code both before and after, you'll never know how much speed you gained from any given tweak. When it comes to performance testing, your choices are either to profile and have enough data to figure out what actually helps... or to not profile and just flail around randomly, with no clue whether you're actually making any real progress.
Post A Comment | 4 Comments | | Link






  User: jigenm4c
  Date: 2006-06-21 14:38 (UTC)
  Subject:   (no subject)
Interesting you post this. One of the features of Mac OS X 10.5 is the fact that GCC now has an option to perform garbage collection on memory pointers in their objective C compiler. This means no more worrying about having to deal with new/delete and have to make sure the memory actually does get freed up when removed.

One of the other big advantages to C is the fact that it's compiled into raw executable code - this means it's faster than any byte code or any other code written by hand in any "higher level" language.

I would venture to say that because Cyclone performs all of these checks for you, and performs so many other checks, that it is not as fast as optimally written C code, however, it is - as they say - safer. I would be interested in finding out if you can actually write robust libraries in Cyclone (such as GUI libraries) and have them perform at a decent amount of speed.

What I would ultimately like to see is a C language without the requirement of malloc/free. If Cyclone is that language, or if C ever had that functionality (with the added bonus of asynchronous garbage collection), I would go running back in a heart beat.
Reply | Thread | Link



Jon
  User: j_b
  Date: 2006-06-21 18:20 (UTC)
  Subject:   (no subject)
For GC in C, libraries like Boehm's have been around for a while.

Also runtime compilation's lets some VMs do performance optimization at run-time (I think IBM's Java VM does this?). With some cases where it's ridiculously hard to optimize for (modern CPUs out-of-order scheduling, hyperthreading's "fake" multiple CPUs where you only get the performance boost if you separate the tasks that are using different CPU bits, etc), the bytecode languages have been shown to be -faster-.
Reply | Parent | Thread | Link



Ben Cantrick
  User: mackys
  Date: 2006-06-22 01:00 (UTC)
  Subject:   (no subject)
I would venture to say that because Cyclone performs all of these checks for you, and performs so many other checks, that it is not as fast as optimally written C code.

You always pay a penalty for more safety. But the speed difference may be smaller than you think. To illustrate, I need to digress briefly and explain how C dereferences pointers at the machine level...

Variables local to a function (technically known as "auto" vars in C compiler wonk language) are allocated on the stack. When the flow of execution enters a function, the stack pointer (SP) is decreased to allocate some free space on the stack. (Remember, the stack grows downward on most architectures.) The function's local variables then go in this free stack space.

So when you want to access your local variable, you have to know where in the stack the variable lives (and the compiler does). Then you add the variable's offset within the stack to the stack pointer. Then go out to memory and fetch the value at that memory location, and put it into a register. You do your thing on the register. And then maybe write the computed value back out to memory, if the variable is used again later in the function. This is generally at least three instructions, and four if you need to write the value back out to memory:

MOV BX, SP ; 1 cycle
ADD BX, offs ; 1 cycle
MOV AX, [BX] ; 2-3 cycles (word-aligned memory location or not)

(do whatever to AX: INC AX or SUB AX,2 or whatever ) ; ?? cycles?

MOV [BX], AX ; 2-3 cycles (word-aligned memory location or not)

Best case, you're looking at 6 machine cycles there. Average more like 7. The same code with bounds checking is:

MOV BX, SP ; 1 cycle
ADD BX, offs ; 1 cycle

CMP BX, maxOffs ; 1 cycle
JNL errorHandler ; 1 cycle if the jump is not taken

MOV AX, [BX] ; 2-3 cycles (word-aligned memory location or not)
(do whatever to AX: INC AX or SUB AX,2 or whatever ) ; ?? cycles?
MOV [BX], AX ; 2-3 cycles (word-aligned memory location or not)

As you can see, we're adding two instructions. So now we're looking at 8-9 cycles vs 6-7 cycles. Probably not as big of a difference as you were expecting.

There are lots of ways to optimize it, too. If your compiler does loop unrolling optimization, then it can often compute if your variable will go out of bounds or not at compile time, and omit bounds check entirely. Also, you only need to do the bounds check if the pointer has changed since last check, so multiple accesses to an unchanged pointer only incur the two-instruction penalty a single time.


Sadly, some of the more amusing C syntaxes will never be optimizable. Take this, for example:

char *from, *to;
while(*to++ = *from++);


This is strcpy(), basically. Copies all characters in from, into to, including the terminating null. You can argue that this is bad code, and anybody who intentionally writes code like this should be shot. I would tend to agree with you. But it IS legal C.

There's no way to know how long the strings are at compile time, so no compile-time checking or optimization is possible. And both pointers change at each iteration of the loop, so each iteration you pay a four cycle penalty - two instructions per pointer. It makes a ~16-17 cycle loop into a ~20 cycle loop.

What you get for that four cycles per loop is... *complete immunity to buffer overflow attacks.* It's doesn't seem like such a high price to me, when I look at it that way.

Intelligent bounds checking is a huge win for security, and the cost isn't very large. Anyone who thinks otherwise just hasn't done the math.

I would be interested in finding out if you can actually write robust libraries in Cyclone (such as GUI libraries) and have them perform at a decent amount of speed.

You need look no farther than the JVM for proof of how an even less efficient scheme (a freakin' virtual machine, even) can still execute code very quickly where it matters.
Reply | Parent | Thread | Link



Alex Belits: mona
  User: abelits
  Date: 2006-06-22 04:18 (UTC)
  Subject:   (no subject)
Keyword:mona
This is refreshing after language development that goes like this:

"Feature X can be used in an unsafe manner? Let's make a whole new language, VM, library, and half-an-OS just to remove it! Oh, and since we are developing a language, I have 65537 great ideas how software should be developed -- can we stuff all of them into that massive monolith so no one, ever, would be able to stray from my great vision? Oh, and I want my New Cool Toolkit For Making Little Widgets to be in the system library! And I want a picture of my dog to be included, just in case..."
Reply | Thread | Link



browse
May 2015