Building of a Virtual Cluster from ScratchYu Zhang Computer Architecture Group Chemnitz University of Technology January 25, 2011 1 Abstract Computing clusters run usually on physical computers. With virtualization approach clusters can also be virtualized. This article describes the building of a virtual cluster based on VirtualBox. Contents 1 2 VirtualBox Installation Creation of Virtual Machines 2.1 Create of New Machines . . . . . . . 2.2 Normal Installation on Master Node 2.3 Minimal Installation on Slave Nodes 2.4 Network Configuration . . . . . . . . 3 3 4 4 5 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Application of SLURM 8 3.1 Possible Problems During Installation . . . . . . . . . . . . . . . . . . . . 8 3.2 Installation and Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Automatic Startup when Booting . . . . . . . . . . . . . . . . . . . . . . . 10 Cluster Network Configuration 4.1 Hostnames . . . . . . . . 4.2 IP Addresses . . . . . . . 4.3 Host List . . . . . . . . . 4.4 Password-less SSH . . . . 4.5 Network File System . . . 13 13 13 13 13 15 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Test with Applications 16 5.1 Simple MPI Program Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 Tachyon Ray Tracer Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Further Work 21 22 23 6 A Removing Virtual Machine B A Bug in NFS 2 0_4.0-69151~Ubuntu~lucid_amd64.virtualbox..deb sudo dpkg -i virtualbox-4-0_4. It does not matter whether it is a Intel or an AMD CPU.0_4.0.0_4. In Ubuntu the command: sudo apt-get install virtualbox-ose or wget http://download.0-69151~Ubuntu~jaunty_amd64.0_4.deb .0. The package architecture has to match the Linux kernel architecture.deb virtualbox-4. One principle worth keeping in mind is: uniformity makes simplicity. that is. where virtualbox-4-0_4. a carefully selected one simplifies building and management hence a considerable amount of effort in future.deb installs the VirtualBox OSE or VirtualBox packages respectively.0. 2 Creation of Virtual Machines Although many Linux Distributions can be used on a Cluster. install the appropriate AMD64 package for a 64-bit CPU.0. according to the distribution and version of the host OS.0-69151~Ubuntu~maverick_amd64.0/ \ virtualbox-4.0.0-69151~Ubuntu~karmic_amd64.deb can also be replaced by * * * * * * virtualbox-4.. 3 .deb virtualbox-4.0.1 VirtualBox Installation There are two basic editions of VirtualBox: VirtualBox and VirtualBox OSE (Open Source Edition).deb virtualbox-4.org/virtualbox/4.0_4.0.0-69151~Ubuntu~hardy_amd64.0.0-69151~Debian~squeeze_amd64.deb virtualbox-4.0. Both have almost the same function except some different features targeting different customers.0-69151~Debian~lenny_amd64.0_4.0_4. to apply the uniform Linux Distribution on master node as well as on slave nodes. For its pleasant features.0. that is.0-69151~Ubuntu_lucid_amd64.0-69151~Ubuntu_lucid_amd64. Debian is considered as a suitable choice for cluster building.deb virtualbox-4. The pane on the right displays the properties of the machine currently selected. with which all packages except the base system are installed from a Debian mirror site instead of from a local CD. Later with package manager any update can be installed as long as Internet access is available. A row of buttons above allow a guest machine to be created and existing machine to be operated. the VirtualBox Manager may look like shown in Figure 1(b). or just press the button “New”. Click on the menu “Machine” and select “New”. it any. 8 GB hard disk space is necessary for possible further packages. a window like Figure 1(a) should come up. here as an alternative we tried with the Debian netinst. It can certainly be installed from a full .2 Normal Installation on Master Node A normal Debian installation with GUI makes cluster management easier.list 4 .iso. The main steps are shown in Figure 2. On the left there is a pane which will later list all the installed guest machines. follow the dialog boxes coming out and make chioces for your machine’s basic configuration. 2. it is empty. It is simple to create new machines within VirtualBox Manager. Since no guest has been created yet.1 Create of New Machines When VirtualBox is started.iso image. saving the effort manually updating the corresponding entries in the file /etc/apt/source. The window is called VirtualBox Manager. After several machines have been installed later.(a) before (b) later Figure 1: VirtualBox Manager 2. After the clone is completed successfully. the best practice is first to make a standard slave node image by installing Debian and all the necessary packages on a single guest. work performed on a slave node is relatively simple.(a) VM Name and OS Type (b) Virtual Harddisk (c) Memory (d) Virtual Disk Location and Size Figure 2: Machine Creating Steps 2. 5 . Therefore only the base system of a Debian installation is enough. To make the installation simple und uniform. For this 1 GB hard disk is needed. Here a scene taken from virtual disk image cloning process is presented.3 Minimal Installation on Slave Nodes Compared with that on a master node. import a new machine from VirtualBox Manager with the cloned disk as its storage. Next the hostnames and IP addresses need to be changed according to the order listed on the left pane for further convenience. then to multiply the harddisk of this image as many copies as desired. Figure 3: master node Figure 4: virtual disk image clone 6 . The following graphics present the network configurations in master as well as slave nodes respectively.rules. put the right ethernet cards in the file /etc/udev/rules.4 Network Configuration Communication is probably the most important and complex aspect for a cluster designer to consider. VirtualBox provides several kinds of network. with the command /sbin/ifconfig ethernet cards rather than eth1 appeared.d/70-persistentnet.d/udev restart /etc/init. If not. say.2.d/networking restart 7 . then adjust it like this. each with many network adapters as alternatives. It is important to have the same ethernet card (a) ethernet 0 on master node (b) ethernet 1 on master node (c) ethernet 1 on slave node Figure 5: ethernet adapter configurations on each node. sudo sudo sudo sudo modprobe -r e1000 modprobe e1000 /etc/init. you must specify one Install gcc by command: sudo apt-get install gcc or from Synaptic Package Manager .1..sourceforge./configure make sudo make install Some common error messages during configure are listed here.15. wget http://en.1.15/slurm-2.bz2 cd slurm-2. UNAME_MACHINE = i686 UNAME_RELEASE = 2.1 Possible Problems During Installation In principle. SLURM can be installed by following the steps below.jp/projects/sfnet_slurm/downloads/slurm/ \ version_2.1. at least for me previously. it suggests a installation of GCC.26-2-686 UNAME_SYSTEM = Linux UNAME_VERSION = #1 SMP Thu Nov 25 01:53:57 UTC 2010 configure: error: cannot guess build type.tar.15. 3. auxdir/config. It performs exclusive or non-exclusive resource access as well as monitors the present state of all nodes in a cluster.1.guess: unable to guess system type .bz2/ tar xvjf slurm-2..tar.1.15 . SLURM is designed for this purpose on linux clusters of all sizes. but if for some unknown reason the make process suddenly stopps like this.3 Application of SLURM Resource management are of non-trivial an effort with ever-growing nodes in a clusster... checking build system type.1/2. sudo apt-get install gcc 8 .15’: configure: error: no acceptable C compiler found in $PATH Install it with the command. Lack of GCC Normally GCC comes with a Debian Installation.6. 1. configure: error: in ‘/home/zhayu/slurm-2. Beginers tend to get into troubles with slurm installation... 2. configure: WARNING: unable to locate blcr installation Solution: sudo sudo sudo sudo sudo sudo sudo apt-get apt-get apt-get apt-get apt-get apt-get apt-get install install install install install install install libnuma1 libnuma-dev libpam0g libpam0g-dev libncurses5-dev libgtk2.... no configure: WARNING: *** pg_config not found. checking for pg_config.. configure: WARNING: unable to locate munge installation .. Solution: wget http://www.... no *** Could not run GTK+ test program.1.1..open-mpi. Warnings in Configuration configure: WARNING: Unable to locate NUMA memory affinity functions .tar.0-dev libmysql++-dev libpq-dev libmunge2 libmunge-dev configure: WARNING: Unable to locate PLPA processor affinity functions .log for the *** exact error that occured. checking for mysql_config. checking why...1.... no configure: WARNING: *** mysql_config not found.7..1/downloads/plpa-1. Evidently no PostgreSQL \ install on system. Evidently no MySQL \ install on system. See the file config. configure: WARNING: Can not build smap without curses or ncurses library . checking for GTK+ ...1..tar.. checking for munge installation.1.. configure: WARNING: Unable to locate PAM libraries .version >= 2.gz cd plpa-1..1/ ..gz tar xzvf plpa-1./configure make sudo make install 9 .org/software/plpa/v1. . *** The test program failed to compile or link.1. This usually means GTK+ is incorrectly installed. cert Last but not least. The configuration file slurm. even important especially for headless nodes started with neither GUI nor Terminal. openssl gensa -out /usr/local/etc/slurm.local to make these daemons getting started from booting.key -pubout \ -out /usr/local/etc/slurm. where other pitfalls have laying there already. In slurm there is an authentication service for creating and validating credentials.* /usr/local/etc/ Finally slurm control daemon and slurm daemon can be started with commands. /usr/local/sbin/slurmctld start /usr/local/sbin/slurmd start 10 . The security certificate as well as key must be generated as shown.* zhayu@node2:/home/zhayu and on node2. Take the ssh copy from node1 to node2 as an example. sudo /usr/local/sbin/slurmctld start sudo /usr/local/sbin/slurmd start 3.conf in our case is shown as an example. can works be pushed forward.3 Automatic Startup when Booting Appending the following lines to /etc/rc. But does not matter too much for our test purpose. then copy the three files to the directory /usr/local/etc/ on all nodes one by one. to configure slurm. which may relieve the administrator of the cluster management and. scp slurm.2 Installation and Configuration Only when the above mentioned error and warning messages don’t come any more. sudo cp slurm.configure: WARNING: Could not find working OpenSSL library Solution: cd src/plugins make sudo make install configure: configure: configure: configure: Solution: remains unknown to me. Cannot support QsNet without librmscall Cannot support QsNet without libelan3 or libelanctrl! Cannot support Federation without libntbl WARNING: unable to locate blcr installation 3.key 1024 openssl rsa -in /usr/local/etc/slurm. Figure 6: configuration file for slurm 11 . Figure 7: graphic user interface to view and modify slurm state when successfully started 12 . 3 Host List Launching an ssh login with an IP address is not always convenient.56.168.56. A better choice is to save all the hostnames in a file for further reference while IP address-involved operations are performed.ssh/id_rsa.56. hence the hostnames range from node1 to node9.103 192.168.101 to 192.168.168.168.pub | ssh zhayu@node2 \ 13 .100 192.56. Here is an example generating password-less SSH access from node1 to node2. Actually saving the public key for the local node to the remote destination node does the miracle.100 for the master node.101 192.56.56.168.56.56.108 for 8 slave nodes respectively.105 192.ssh zhayu@node2’s password: zhayu@node1:~$ cat . hostname can be set in the file /etc/hostname 4.168. We append the following lines to the file /etc/hosts: 192.56.104 192.168.2 IP Addresses Every node has a unique IP address within the cluster. After that. the network need to be restarted to apply the new one.168.1 Hostnames We have one master node for management (computing as well when in need) and eight slave nodes for computing. IP address can be changed in the file /etc/network/interface.168.102 192.106 192.56. zhayu@node1:~$ ssh-keygen -t rsa zhayu@node1:~$ ssh zhayu@node2 mkdir -p .56.4 Password-less SSH Install the SSH packages on all nodes with the following command.4 Cluster Network Configuration 4.168. To run mpi programs across nodes within a cluster. 4. We set 192. and 192.168. passwordless SSH login is needed.107 192. sudo apt-get install openssh-server openssh-client SSH login from master node to all slave nodes should be no problem if the network has been properly configured.108 node1 node2 node3 node4 node5 node6 node7 node8 node9 4. In Debianbased Linux.56. password is still demanded after this. If no. the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Permission denied (publickey).’zhayu@node2>>. Agent admitted failure to sign using the key. Repeat it till SSH logins from master node to all the slave nodes require no password any more.ssh/authorized_keys’ zhayu@node2’s password: zhayu@node1:~$ ssh zhayu@node2 SSH login works as follows.d/ssh status to see whether SSH daemon is running on the SSH server. ssh: connect to host node2 port 22: Connection refused Issuing a command like: sudo /etc/init. Last login: Mon Jan 17 19:23:14 2011 zhayu@node2:~$ If on the other hand. then simply run ssh-add on the client node.6.26-2-686 #1 SMP Thu Sep 16 19:35:51 UTC 2010 i686 The programs included with the Debian GNU/Linux system are free software. 14 . Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY. it means SSH server daemon has not yet been started on the SSH sever. zhayu@node1:~$ ssh node2 Linux node2 2.d/ssh start to start SSH daemon. type sudo /etc/init. If the following error message comes. or a message comes as the following. to the extent permitted by applicable law. if properly done. 4. To the user. One or more nodes hold the file system on their physical hard disk and act as the NFS server while other nodes “mount” the file system locally.5 Network File System The Network File System (NFS) allows nodes within the entire Computing Cluster to share part of their file system. (a) setting for NFS server (b) setting for NFS client Figure 8: NFS settings 15 . and a line of the form <NFS server>:<remote location> to /etc/fstab on the client end. A node acting as a NFS server needs to install the nfs related packages like this. important for running parallel code like mpi programs. apt-get install nfs-common nfs-kernel-server Appending a lines taking the form <directory to share> <allowed machines>(options) to the file /etc/export on the server end. the file exists on all the nodes at once. h> #include <mpi. char name[80].Figure 9: NFS get mounted from booting on slave nodes 5 Test with Applications 5. int length. size.&rank). // note that argc and argv are passed // by address MPI_Comm_rank(MPI_COMM_WORLD.1 Simple MPI Program Test As any test at the very beginning is simple. 16 .h> int main(int argc.c*/ #include <stdio. char ** argv) { int rank. &argv). MPI_Init(&argc. we want to see whether our virtual cluster works with a simple example. Hello World. The program looks like the following /*hello. We applied it here to illustrate how to run an MPI parallel application on a cluster.size.&size).2 Tachyon Ray Tracer Test Tachyon is a library developed for parallel graphic rendering with support to both distributed and shared memory parallel model. printf("Hello World MPI: processor %d of %d on %s\n". rank. Figure 10: “hello world” test 5. mpicc should be used instead of gcc. So change the option from CC=gcc to CC=mpicc within linux-beowulf-mpi 17 .&length). which enabled it an ideal benchmark application for clusters. } We compile and execute the program as Figure 10 shows.MPI_Comm_size(MPI_COMM_WORLD. MPI_Finalize().name). As Tachyon will be started by mpi. MPI_Get_processor_name(name. but also a brief application of the above mentioned batch system SLURM. the more obvious its power emerges. 18 .9. Then submit the task and wait for its completion.o] Error 1 make[1]: *** [all] Error 2 make: *** [linux-beowulf-mpi] Error 2 When make. a lot of architectures it supports are listed. make[2]: *** [.9. wget http://jedi.edu/~johns/raytracer/ \ files/0.98.ks. The more nodes a cluster embraces.gz tar xvzf tachyon-0. Here is an simple example of SLURM script.gz cd tachyon make make linux-beowulf-mpi Figure 11: Beowulf cluster architecture With Tachyon. containing all the necessary task specifications.98.tar.uiuc.98. we would like to illustrate not only the parallel rendering on a 9-node virtual cluster. As “linux-beowulf-mpi” is the actual architecture in our case.9/tachyon-0.tar.in the file /tachyon/unix/Make-arch or else the following error message comes and make process is aborted../compile/linux-beowulf-mpi/libtachyon/parallel. build it like this. sbatch . A performance speedup achieved by 9 nodes is presented as figure 14 below. an arbitary number of processes on an arbitary number of nodes. 3PN9.Figure 12: slurm job submitting and outcome #!/bin/bash #SBATCH -n=9 mpirun tachyon/compile/linux-beowulf-mpi/tachyon \ tachyon/scenes/dna. When slurm job is submitted to the batch system. the available computing nodes was allocated for this task. Tests were made with different task allocation methods. namely 1PN9. 1P1N and 9P1N.tga Submit the slurm job with the command./task1.dat -fullshade -res 4096 4096 -o dna2.sh All results will be saved in a specified file. We can see clearly from the graphic the speedup a 9-node virtual cluster achieved. that is. 19 . Figure 13: Tachyon runs on the allocated computing nodes Figure 14: performance speedup with different task allocation methods 20 . to bring slurm working for this. where pxeboot of slave nodes from a master node is necessary.6 Further Work This is only the first half of our task. The aim is to control the virtual machines in a cluster with a batch system. every pxebooted node should have its own File system rather than the one shared from the master node. However. 21 . It is caused by trying to register a machine with an already existing UUID. A confirmation window will come up that allows you to select whether the machine should only be removed from the list of machines or whether the files associated with it should also be deleted. right-click on it in the Manager’s VM list select ”Remove” from the context menu that comes up.VirtualBox/VirtualBox.xml to delete the lines containing the name of the machine to be created. 22 . The ”Remove” menu item is disabled while a machine is running.A Removing Virtual Machine To remove a virtual machine which you no longer need. probably you may run into the trouble like this””.[6] In linux if a machine with a name that has ever been used before need to be created. Go to . . then msg="if-up. 23 . even if it is interrupted rm -f /var/run/network/mountnfs 2>/dev/null || exit 0 } trap on_exit EXIT # Enable emergency handler do_start elif [ yes = "$FROMINITD" ] . not mounting Just replace the later part of /etc/network/if-up. not mounting" log_failure_msg "$msg" # Log if /usr/ is mounted [ -x /usr/bin/logger ] && /usr/bin/logger -t "if-up. then do_start fi --------------------------------------------------------------------This will use a file instead of a directory to lock the action and files would be cleaned up on boot.d/mountnfs[$IFACE] " "$msg" exit 0 fi touch /var/run/network/mountnfs on_exit() { # Clean up lock when script exits. # Using ’no !=’ instead of ’yes =’ to make sure async nfs mounting is # the default even without a value in /etc/default/rcS if [ no != "$ASYNCMOUNTNFS" ].B A Bug in NFS There comes always the error message when NFS are to be mounted by slave nodes when booting: if-up.d/mountnfs[$IFACE]: lock /var/run/network/mountnfs exist.d/mountnfs with the following code:[7] .. then # Not for loopback! [ "$IFACE" != "lo" ] || exit 0 # Lock around this otherwise insanity may occur mkdir /var/run/network 2>/dev/null || true if [ -f /var/run/network/mountnfs ].d/mountnfs[eth0]: lock /var/run/network/mountnfs exist.