September 25th, 2008

ronin

SEAforth 40C18 CPU ships. 40 cores, <9 mW per core at full speed.


IntellaSys has unveiled its SEAforth 40C18, a 40-core multicore processor designed for embedded wireless, portable, and distributed data processing applications. The SEAforth 40C18 is an array of 40 fully functional CPUs operating asynchronously, each of the cores a complete computer, with its own ROM, RAM, and interprocessor communication. Together they can deliver up to 26 billion operations per second.

With what IntellaSys claims is the smallest core size design (0.13 mm2), the SEAforth processor consumes 28 times less power while running 240 times faster than competing architectures. By creating a RAM and ROM on each core, individual cores run at the full native speed of the silicon instead of being throttled down to a slower external system clock frequency. The automatic synchronization feature between cores allows the processors to share the computing load by talking to each other to pass data, status signals and even code blocks. When individual CPUs are not active, they automatically shut down or sleep.

The processor will be offered with VentureForth, a Forth-based IDE that includes fully interactive programming, testing, and debugging facilities. VentureForth includes compilers for both Windows and Linux and a simulator for debugging, and contains low-level primitives as well as the high-level tools necessary to map programs across the array of cores in a SEAforth processor. The SEAforth 40C18 is capable of executing 80 percent of its VentureForth instructions in 1.38 nanoseconds while drawing 7mW of power or less per CPU.


http://www.ddj.com/hpc-high-performance-computing/210603583

http://www.intellasys.net/templates/trial/content/S40C18_DataSheet.pdf
  • Current Mood
    der uber-nerd
  • Tags
ronin

In-memory B-trees - faster (and smaller) than binary trees.


I found John-Mark Gurney’s B-tree via google search. It is well coded and full of clever ideas. The original version has small memory footprint, but it is not as fast as STL’s red-black tree. I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version. I think I am now at the position to say: at least for some applications, B-tree is a better ordered data structure than most of binary search trees.

The most attractive feature of B-tree is its small memory usage. A binary tree needs at least two pointers for each record, which amounts to 16N on a modern 64-bit systems. A B-tree only needs one pointer. Although in a B-tree each node may not be full, a sufficiently large B-tree should be at least 50% full by definition and in average around 75% full. On a 64-bit system, the extra memory is only 8N/0.75+K(1-1/0.75)=(10+0.3K)N, where K is the size of a key. In fact we can do even better as we do not need to allocate the null pointers in leaves. The practical memory overhead can be reduced to below (5+0.3K)N (in fact, as the majority of nodes in a B-tree are leaves, the factor 5 should be smaller in practice), far better than a binary search tree. On speed, no binary search tree with just two additional pointers can achieve the best performance. We usually need at least a pointer to the parent (AVL tree and standard red-black tree) or a random number (treap) to get good performance. B-tree is different. It is even faster than the standard red-black tree while still using (5+0.3K)N extra memory! People should definitely pay more attention to B-tree.

The modified B-tree is available here as a single C header file. Example is here. Currently, the APIs are not very friendly but are ready to use. In case you want to give a try. Note that you must make sure the key is absent in the tree before kb_put() and make sure the key is present in the tree before calling kb_del().
  • Current Mood
    der uber-nerd
  • Tags