Clustering is remarkably, almost trivially, easy. There is a current Linux distro called ClusterKnoppix that combines Knoppix and OpenMosix. Knoppix is what is known as single system image distribution: a whole operating system that runs from CD, and uses a RamDisk as a read/write file system. OpenMosix is a load balancing extension to the Linux kernel and provides a transparent mechanism for migrating processes across heterogeneous nodes.
So, in theory, you can plug some old PCs together, boot them off a CD, and they will automatically detect each other, and start picking up work passed to them from other nodes.
The funny thing is that this actually works in practice, too. In fact, you only need 1 CD, because all other nodes can be configured to boot over PXE.
Setting aside the low tech spikes (4 nodes running on 1 WinXP box via VMWare – yugh) it is incredibly simple to make all this work. But: there is a gotcha…
I used to think of threads as different to processes. Threads generally belong to a process, but share a common state with all threads in the same process. A file opened in one thread may be read from another intra-process thread. The same is true for access to memory. Processes, however, share very little in the way of resources; they have a protected memory address space, they their own file descriptors, and so on.
Recently, I’ve been educated that, under Linux, processes and threads are created in a standard way. E.g. the only thing that differentiates a process from a thread is the decision about which resources to share, and which to protect. Hence, when I run one multi-threaded java application under linux, I see many java processes displayed in ps -ef. The threads are merely processes that share a file descriptor table and a memory space.
The problem comes when OpenMosix wants to migrate a process to another node. It can’t really tell the difference between a full-blown process, and a mere thread. If the kernel makes the wrong decision (e.g. treating a thread as self contained process, saving memory state, and transferring all file desciptors to a new node) then it risks threads from the same process running on different nodes. This will lead to tears before bedtime, unless a safe shared memory architecture is in place (which OpenMosix doesn’t have, by the way).
This is a problem that occurs with Java applications, which are invariably multi-threaded. To work around it you have to avoid using Linux system level threads, and instead use user-level threads. In this case, OpenMosix just sees one process for each Java application, and it is the Java virtual machine which maintains thread state. These user-level threads are known as green threads, and require what is known as cooperative multi-tasking. In other words, the Java virtual machine has to be told when to context-switch – it can’t just let the operating system make the switch. This cooperation is achieved through judicious use of yield() and sleep() methods.
Note that yield() and sleep() are the responsibility of the class libraries and application code, and are not automatically inserted into bytecode. If a green thread is greedy (e.g. performs intensive CPU calculations without ever touching the disk), it is possible that the JVM will never get the chance to context-switch.
So, green threads have their disadvantages, but this should more than be made up for by being able to migrate a whole java application from node to node. All we need is a Java VM which supports green threads.
The knoppix basic ISO has a fantastically large amount of useful software. It even has Sun’s latest Java Runtime (JDK 1.4.2). Unfortunately, JDK 1.4 doesn’t implement green threads. This means that our JVM process cannot be migrated.
At this point, I’m not quite sure how to get around this problem, but there are options. We only need to migrate when we fork a new java process (not when we need a new thread), so we can:
- Somehow tell OpenMosix to migrate my java process at startup, but never migrate after that, or,
- Make each java process a member of a pvm (parallel virtual machine) , so that in effect we get the above solution for free. Not sure how easy it is to get OpenMosix nodes to automatically join a pvm, but we’ve seen examples where povray can automatically distribute each frame of an animation across a cluster.