My guess is that the nodes are operating at their memory limit. When one node fails, dispatcher tries to relocate users assigned to this node to other nodes. As other nodes are already at their limits, this triggers failure of the whole cluster (this can be prevented by setting "maxfailingnodes" or "maxfailingmirrors" parameter in Dispatcher.xml configuration file). The memory limit is normally 2GB per node or 3GB with special configuration of Windows on 32bit machine. Check virtual memory consumed by each node.