17 January 2012

52. Network (hosts) setting for mpich2 /mpd on debian

I have a switch with three computers attached two it. The ip addresses are static and are 192.168.1.1 (beryllium), 192.168.1.101 (boron), 192.168.1.102 (tantalum)

I use the computers to run nwchem across several nodes. The default /etc/chosts settings cause problems when trying to connect different instances of mpd on different nodes.

e.g.

beryllium: /etc/hosts
127.0.0.1 localhost  beryllium
192.168.1.101 boron
192.168.1.102 tantalum

Won't work, but 
127.0.0.1 localhost
192.168.1.1  beryllium
192.168.1.101 boron
192.168.1.102 tantalum

Will.

See for example:

Using /etc/hosts:
127.0.0.1 localhost  beryllium
192.168.1.101 boron
192.168.1.102 tantalum

me@tantalum:~$ mpdtrace -l
tantalum_51108 (192.168.1.102)

me@beryllium:~$ mpd --ncpus=6 -h 192.168.1.102 -p 51108 &
[2] 26283

me@tantalum:~$ mpdtrace -l
tantalum_51108 (192.168.1.102)
beryllium_38569 (127.0.0.1)

See the ip address (127.0.0.1)? tantalum sees 127.0.0.1, which is it's localhost (i.e. also tantalum). It should point at beryllium (192.168.1.1)

Using /etc/hosts:
127.0.0.1 localhost
192.168.1.1 beryllium
192.168.1.101 boron
192.168.1.102 tantalum


me@tantalum:~$ mpdtrace -l
tantalum_58007 (192.168.1.102)

me@beryllium:~$ mpd --ncpus=6 -h 192.168.1.102 -p 58007 &
[2] 26596

me@tantalum:~$ mpdtrace -l
tantalum_58007 (192.168.1.102)
beryllium_56234 (192.168.1.1)

And now it looks better.

"But /etc/hosts keeps changing on reboot!"
Which is because Network Manager keeps fiddling with it. Look at point 6 here http://verahill.blogspot.com/2012/01/debian-testing-64-wheezy-small-fixes.html

I haven't figured out how to do this via the command line yet.

50. Sharing an internet connection over a network switch on debian

What it does: One computer has two network cards. One card is used to connect to the internet, the other one is connected to a switch making up a local network. Two more computers are connected to the switch. They all share the internet connection of the first computer. All computers on the local network can ssh into each other.

I have to register the MAC address of each computer which I want to connect to the network at work. The reason probably has more to do with cost than security.

I do not use /etc/network/interfaces in this example
Instead we're only using Network Manager, but from the CLI.

The sharing is enabled using Firestarter. You can probably figure out how to use it yourself without reading this rather lengthy post, but I'll leave it all up here in case you want to know the exact configuration.

Since I use apt-cache to cut down on network traffic (http://verahill.blogspot.com/2012/01/debian-testing-64-wheezy-apt-cache.html) I don't feel too bad about surreptitiously putting a few additional units online.

This is my network:

internet ----- eth0 -Beryllium - eth2 --- switch----( eth0-tantalum, eth0-boron)

Or in words --  I have three computers. One, Beryllium, has two network cards, eth0 and eth1. eth0 is connected to the internet (dhcp). eth1 is connected to a gigabit switch (essentially a dumb router -- no dhcp). Two more computers are connect to the same switch -- Tantalum (eth0) and Boron (eth0). Tantalum has local ip address 192.168.1.102 and Boron has ip address 192.168.1.101.

I do have an additional ethernet card on Beryllium, eth1, which we will ignore.

This way of sharing an internet connection relies on firestarter, which has one problem -- it won't (easily) allow two network cards on the same local network i.e. if eth0 is connected to the internet and you want both eth1 and eth2 on the same local network, firestarter won't help you.

I also need to be able to ssh from any computer on the local network to any other computer on the local network. This method allows for that. Same goes for apt-cache and mpich.

To satisfy my paranoia I've replaced a lot of the more incriminating numbers with X's.


Firestarter:
Firestarter is a firewall -- you'd typically use it to restrict traffic, not enable it. But iptables -- the true firewall and traffic shaper of linux -- is a powerful and slightly odd beast, and firestarter provides a gui-friendly way of editing some aspects of it.

Install firestarter on your internet connected computer (here, beryllium):
sudo apt-get install firestarter

Start it:
sudo firestarter

Chances are it will ask you questions about internet connected network device -- which is eth0 -- and local network connected device -- here it's eth2. Also, check Enable internet connection sharing. If it doesn't ask you, go to Edit, Preferences and select Firewall -- Network Settings.

In my case I've set it up for static ip. I would suspect it to be fairly easy to set up dhcp as well.

I don't know how to put TWO network cards from the same computer on the same local network.

In the main firestarter windows, under policy, you might want to add the IP addresses of the computers on the local network under 'Allow connections from host' -- but that depends on your needs. I prefer to expose all ports in order to deal with mpich.

You may also want to edit what services are allowed. Firestarter is fairly simple to use.


Configuration: Beryllium
eth0 is connected to the internet, and is assigned an IP address by the university using dhcp.
eth2 is connect to the switch and I've manually set the IP address to 192.168.1.1 in network manager. You can edit the file (see below) directly.

The gateway for eth2 is set to 192.168.1.1. Subnet mask is 255.255.255.0 which shows up as 24 in the configuration file below (i.e. 192.168.1.1;24;192.168.1.2 would mean IP 192.168.1.1, subnet 255.255.255.0 and gateway 192.168.1.2)

sudo cat /etc/NetworkManager/system-connections/eth0
[802-3-ethernet]
duplex=full
mac-address=XX:XX:XX:XX:XX:XX
[connection]
id=eth0
uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
type=802-3-ethernet
timestamp=1326324509
[ipv6]
method=auto
[ipv4]
method=auto


sudo cat /etc/NetworkManager/system-connections/eth2
[802-3-ethernet]
duplex=full
mac-address=XX:XX:XX:XX:XX:XX
[connection]
id=eth2
uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
type=802-3-ethernet
timestamp=1326690564
[ipv6]
method=auto
[ipv4]
method=manual
addresses1=192.168.1.1;24;192.168.1.1;

Configuration: Tantalum
sudo cat /etc/NetworkManager/system-connections/eth0

[802-3-ethernet]
duplex=full
mac-address=XX:XX:XX:XX:XX:XX
[connection]
id=lan
uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
type=802-3-ethernet
timestamp=1326152420
[ipv6]
method=auto
[ipv4]
method=manual
dns=XXX.XXX.1.99;
addresses1=192.168.1.102;24;192.168.1.1;


Configuration: Boron
sudo cat /etc/NetworkManager/system-connections/eth0

[802-3-ethernet]
duplex=full
mac-address=XX:XX:XX:XX:XX:XX
[connection]
id=lan
uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
type=802-3-ethernet
timestamp=1326152420
[ipv6]
method=auto
[ipv4]
method=manual
dns=XXX.XXX.1.99;
addresses1=192.168.1.101;24;192.168.1.1;

Quick word on apt-cache:
If you follow this guide: http://verahill.blogspot.com/2012/01/debian-testing-64-wheezy-apt-cache.html
and you're running your apt-cache server on 192.168.1.1 in the example above, change your /etc/apt/sources.list so that
deb http://192.168.1.2:3142/ftp.au.debian.org/debian/ testing main contrib non-free
becomes
deb http://192.168.1.2:3142/ftp.au.debian.org/debian/ testing main contrib non-free


49. Gromacs -- hangs on multicore when doing normal mode analysis

Symptom:
when doing
mdrun -s nm.tpr -mtx nm.mtx -v -deffnm nm
on a system with 637 atoms you end up with:
...Finish step 636 out of 637
and it hangs there with all cores running at 100%

Reason:
For some reason the normal mode analysis of at least this particular system won't run on multiple cores.

Solution:
Use an mpi compiled version of mdrun (see previous posts on compiling _dd, _mpi and _ddmpi versions of gromacs) and force the use of ONE core.

mpd --ncpus=4 &
mpdrun -n 1 mdrun_mpi -s nm.tpr -mtx nm.mtx -v -deffm nm

works!

Confirmation
This was confimed by running it on four computers:
64 bit: a six core AMD 64 using a compiled version of gromacs. Hangs.
64 bit: a four core intel i5 using both the debian version and a compiled version of gromacs. Hangs.
64 bit: an older four core intel using a compiled version of gromacs. Hangs.
32 bit: an old single-core laptop using the debian version of gromacs. Works.

Next, three single-core virtual machines were set up -- a stable 32 bit, a testing 32 bit and a testing 64 bit machine, all with the debian version of gromacs (sudo apt-get install gromac). They all worked, as they only had a single core.