Showing posts with label mpich2. Show all posts
Showing posts with label mpich2. Show all posts

24 February 2012

75. [solved] Problems with homebuilt nwchem 6.1 on Debian Testing


EDIT 18 May 2012: 
It's now been solved
Compiling nwchem 6.1 with internal libs on debian:
 http://verahill.blogspot.com.au/2012/05/compiling-nwchem-61-with-internal-libs.html
Compiling nwchem 6.1 with openblas on debian:
 http://verahill.blogspot.com.au/2012/05/building-nwchem-61-on-debian.html


UPDATE April 2012: Someone else is having the same problem: http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id435/ . Binaries built on ROCKS 5.4.3 work, but binaries built on debian testing don't: the gfortran version is GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50). On debian, which yields a segfaulting binary, the version is GNU Fortran (Debian 4.6.3-1) 4.6.3


Nwchem 6.1 was released in February this year. The build instructions are ALMOST the same as for Nwchem 6.0 -- the difference is the use of export USE_MPIF4=y. Well, that and me not having much success in actually USING nwchem as opposed to building it.

There is now an nwchem version with mpi support in the debian unstable repos. I have not used or tested it.

I can build the 32 bit version of nwchem 6.1 just fine.Building the 64 bit version works absolutely fine too. However, once you attempt to run, it crashes. Ergo, this is NOT A SOLUTION. It's a bunch of error messages so that more seasoned and skilled operators than I may offer a solution. If you have an option, build and use version 6.0 instead.

Update:
I built a version with openmpi support as well, which also segfaults:
Here are the build instructions:

sudo apt-get install openmpi-bin openmpi-dev
export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=/home/me/tmp/nwchem-6.1
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr/lib/openmpi
export MPI_INCLUDE=/usr/lib/openmpi/include
export USE_MPIF4=y
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77"
cd $NWCHEM_TOP/src
make clean
make  nwchem_config
make  FC=gfortran


and here's what happens on execution

[beryllium:24650] *** Process received signal ***
[beryllium:24650] Signal: Segmentation fault (11)
[beryllium:24650] Signal code: Address not mapped (1)
[beryllium:24650] Failing at address: 0x44000098
[beryllium:24650] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x324f0) [0x7f08deeb84f0]
[beryllium:24650] [ 1] /usr/lib/libmpi.so.0(PMPI_Comm_set_errhandler+0x60) [0x7f08e0526c30]
[beryllium:24650] [ 2] ./nwchem() [0x292d504]
[beryllium:24650] [ 3] ./nwchem() [0x292d596]
[beryllium:24650] [ 4] ./nwchem() [0x40657a]
[beryllium:24650] [ 5] ./nwchem() [0x406f7d]
[beryllium:24650] [ 6] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f08deea4ead]
[beryllium:24650] [ 7] ./nwchem() [0x405189]
[beryllium:24650] *** End of error message ***


This only happens on 64 bit debian -- 32 bit deb and 64 bit centos are both fine

OLD POST:

--start here --

Here's what I've done so far

Put a hold on your mpich2 and mpich2-dev packages (see e.g. here for more details)
1. edit your /etc/apt/sources.list to allow packages from stable e.g.

deb ftp://ftp.au.debian.org/debian/ testing main contrib non-fre
deb ftp://ftp.au.debian.org/debian/ stable main contrib non-free

2. create an /etc/apt/preferences file e.g.

Package: *
Pin: release a=testing
Pin-Priority: 990
Package: *
Pin: release a=stable
Pin-Priority: -10
2. install v 1.2 explicitly
sudo apt-get update && sudo apt-get install mpich2=1.2.1.1-5 libmpich2-dev=1.2.1.1-5

3. put a hold on the packages

sudo su
echo "mpich2 hold"|dpkg --set-selections
echo "libmpich2-dev hold"|dpkg --set-selections

exit

Download the nwchem source
cd ~
wget http://www.nwchem-sw.org/images/Nwchem-6.1-2012-Feb-10.tar.gz
tar -xvf Nwchem-6.1-2012-Feb-10.tar.gz
cd nwchem-6.1

create buildconf.sh in ~/nwchem-6.1
Put the following in it (for 64 bit system):
export LARGE_FILES=TRUE
export TCGRSH=/usr/local/bin/ssh
export NWCHEM_TOP=/home/me/nwchem-6.1
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include/mpich2
export LIBMPI="-lmpich -lfmpich"
export NWCHEM_MODULES="all"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

Build
Start the build
sh buildconf.sh

Building takes about half an hour. Everything builds fine. However, running -- with or without mpdrun -- causes the error below.

It doesn't matter how much memory I allocated. The error seems to have something to do with "Invalid write of size 8" which I understand to mean that pointers are 8 bytes long but don't have 8 bytes allocated to them. But then I'm not an expert.

Would it have something to do with
USE_MPIF4=y?

Without USE_MPIF4 I end up with the stupid_* error messages (stupid_sum, stupid_task etc.)



Error:
running e.g.  mpdrun -n 2 nwchem nwchem.nw gives:

      Screening Tolerance Information
      -------------------------------
          Density screening/tol_rho: 1.00D-10
          AO Gaussian exp screening on grid/accAOfunc:  14
          CD Gaussian exp screening on grid/accCDfunc:  20
          XC Gaussian exp screening on grid/accXCfunc:  20
          Schwarz screening/accCoul: 1.00D-08

0:Segmentation Violation error, status=: 11
(rank:0 hostname:tantalum pid:19944):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0



More detail:
Running just nwchem nwchem.nw gives a bit more detail:

      Screening Tolerance Information
      -------------------------------
          Density screening/tol_rho: 1.00D-10
          AO Gaussian exp screening on grid/accAOfunc:  14
          CD Gaussian exp screening on grid/accCDfunc:  20
          XC Gaussian exp screening on grid/accXCfunc:  20
          Schwarz screening/accCoul: 1.00D-08

0:Segmentation Violation error, status=: 11
(rank:0 hostname:tantalum pid:19676):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
application called MPI_Abort(comm=0x84000001, 11) - process 0
*** glibc detected *** nwchem: corrupted double-linked list: 0x000000010ac34880 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x75ab6)[0x7f597b129ab6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7754c)[0x7f597b12b54c]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7f597b12e7ec]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0xcc811)[0x7f597bbd5811]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0xdba7f)[0x7f597bbe4a7f]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0xdbbaa)[0x7f597bbe4baa]
/usr/lib/x86_64-linux-gnu/libgfortran.so.3(+0x1ab09)[0x7f597bb23b09]
/lib64/ld-linux-x86-64.so.2(+0xe21c)[0x7f597c42421c]
/lib/x86_64-linux-gnu/libc.so.6(+0x36df2)[0x7f597b0eadf2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36e45)[0x7f597b0eae45]
/usr/lib/libmpich.so.1.2(+0xbedc9)[0x7f597c101dc9]
/usr/lib/libmpich.so.1.2(MPID_Abort+0x6d)[0x7f597c122d0d]
/usr/lib/libmpich.so.1.2(PMPI_Abort+0x2f5)[0x7f597c090805]
nwchem[0x2896591]
nwchem[0x2883883]
/lib/x86_64-linux-gnu/libc.so.6(+0x324f0)[0x7f597b0e64f0]
nwchem[0x29b6043]
nwchem[0x27a04a0]
nwchem[0x27a3955]
nwchem[0x271492b]
nwchem[0x5cf410]
nwchem[0x5b3d18]
nwchem[0x5a9735]
nwchem[0x5a99b6]
nwchem[0x418ee8]
nwchemAborted

And more detail:
valgrind nwchem nwchem.nw



==19910==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                          
==19910==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)                                                                      
==19910==    by 0x5A9734: nwdft_ (nwdft.F:274)                                                                                
==19910==    by 0x5A99B5: dft_energy_ (nwdft.F:18)                                                                            
==19910==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)                                                              
==19910==    by 0x41A57B: task_energy_ (task_energy.F:95)                                                                    
==19910==    by 0x40DAD2: task_ (task.F:337)                                                                                  
==19910==    by 0x4068F5: MAIN__ (nwchem.F:251)                                                                              
==19910==  Address 0x199750a0 is not stack'd, malloc'd or (recently) free'd                                                  
==19910==                                                                                                                    
==19910== Invalid write of size 8                                                                                            
==19910==    at 0x29B6043: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                              
==19910==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                            
==19910==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                          
==19910==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                            
==19910==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)                                          
==19910==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)                                                                      
==19910==    by 0x5A9734: nwdft_ (nwdft.F:274)                                                                                
==19910==    by 0x5A99B5: dft_energy_ (nwdft.F:18)                                                                            
==19910==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)                                                              
==19910==    by 0x41A57B: task_energy_ (task_energy.F:95)                                                                    
==19910==    by 0x40DAD2: task_ (task.F:337)                                                                                  
==19910==    by 0x4068F5: MAIN__ (nwchem.F:251)                                                                              
==19910==  Address 0x199750b0 is not stack'd, malloc'd or (recently) free'd                                                  
==19910==                                                                                                                    
0:Segmentation Violation error, status=: 11                                                                                  
(rank:0 hostname:tantalum pid:19910):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
application called MPI_Abort(comm=0x84000001, 11) - process 0                                                                
==19910==                                                                                                                    
==19910== HEAP SUMMARY:                                                                                                      
==19910==     in use at exit: 4,303,284,335 bytes in 695 blocks                                                              
==19910==   total heap usage: 2,132 allocs, 1,437 frees, 4,305,897,103 bytes allocated                                        
==19910==                                                                                                                    
==19910== LEAK SUMMARY:                                                                                          
==19910==    definitely lost: 24 bytes in 1 blocks
==19910==    indirectly lost: 512 bytes in 1 blocks
==19910==      possibly lost: 0 bytes in 0 blocks
==19910==    still reachable: 4,303,283,799 bytes in 693 blocks
==19910==         suppressed: 0 bytes in 0 blocks
==19910== Rerun with --leak-check=full to see details of leaked memory
==19910==
==19910== For counts of detected and suppressed errors, rerun with: -v
==19910== Use --track-origins=yes to see where uninitialised values come from
==19910== ERROR SUMMARY: 662 errors from 9 contexts (suppressed: 4 from 4)


And way too much detail:
valgrind --leak-check=full --track-origins=yes --log-file=valgrind.log nwchem nwchem.nw
==20005== Memcheck, a memory error detector

==20005== Memcheck, a memory error detector
==20005== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==20005== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==20005== Command: nwchem nwchem.nw
==20005== Parent PID: 19563
==20005==
==20005== Warning: set address range perms: large range [0x3952b040, 0x13352b110) (undefined)
==20005== Syscall param write(buf) points to uninitialised byte(s)
==20005==    at 0x12803980: __write_nocancel (syscall-template.S:82)
==20005==    by 0x127A8B92: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1276)
==20005==    by 0x127A8809: new_do_write (fileops.c:530)
==20005==    by 0x127A8B34: _IO_do_write@@GLIBC_2.2.5 (fileops.c:503)
==20005==    by 0x127A9347: _IO_file_sync@@GLIBC_2.2.5 (fileops.c:905)
==20005==    by 0x1279DE19: fflush (iofflush.c:43)
==20005==    by 0xA3CF1F: hdbm_file_flush (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x8C84AA: rtdb_seq_put (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x8C55BD: rtdb_put (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x8C48B2: rtdb_put_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x9EAFAC: util_set_rtdb_state_ (util_rtdb_state.F:40)
==20005==    by 0x4067FD: MAIN__ (nwchem.F:222)
==20005==  Address 0x10950022 is not stack'd, malloc'd or (recently) free'd
==20005==  Uninitialised value was created by a stack allocation
==20005==    at 0x8C6130: rtdb_seq_put_info (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6048: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975040 is 0 bytes after a block of size 42,008,576 alloc'd
==20005==    at 0x1155679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20005==    by 0x291F69B: morecore (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x291F793: kr_malloc (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A492B: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B604D: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975050 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6052: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975060 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6057: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975070 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6035: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975080 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6039: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x19975090 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B603E: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x199750a0 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005== Invalid write of size 8
==20005==    at 0x29B6043: dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A049F: GAI_DGEMM (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x27A3954: pnga_matmul (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x271492A: ga_dgemm_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5CF40F: diis_bld12_ (in /home/me/nwchem-6.1/bin/LINUX64/nwchem)
==20005==    by 0x5B3D17: dft_main0d_ (dft_main0d.F:549)
==20005==    by 0x5A9734: nwdft_ (nwdft.F:274)
==20005==    by 0x5A99B5: dft_energy_ (nwdft.F:18)
==20005==    by 0x418EE7: task_energy_doit_ (task_energy.F:251)
==20005==    by 0x41A57B: task_energy_ (task_energy.F:95)
==20005==    by 0x40DAD2: task_ (task.F:337)
==20005==    by 0x4068F5: MAIN__ (nwchem.F:251)
==20005==  Address 0x199750b0 is not stack'd, malloc'd or (recently) free'd
==20005==
==20005==
==20005== HEAP SUMMARY:
==20005==     in use at exit: 4,303,284,355 bytes in 697 blocks
==20005==   total heap usage: 2,135 allocs, 1,438 frees, 4,305,900,787 bytes allocated
==20005==
==20005== 536 (24 direct, 512 indirect) bytes in 1 blocks are definitely lost in loss record 662 of 679
==20005==    at 0x1155679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20005==    by 0x11D6E128: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x11E332F8: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x11E2C573: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x11D6BB47: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==20005==    by 0x1093CCCF: call_init (dl-init.c:85)
==20005==    by 0x1093CDC6: _dl_init (dl-init.c:134)
==20005==    by 0x1092FB29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
==20005==    by 0x1: ???
==20005==    by 0x7FF00033A: ???
==20005==    by 0x7FF000341: ???
==20005==
==20005== LEAK SUMMARY:
==20005==    definitely lost: 24 bytes in 1 blocks
==20005==    indirectly lost: 512 bytes in 1 blocks
==20005==      possibly lost: 0 bytes in 0 blocks
==20005==    still reachable: 4,303,283,819 bytes in 695 blocks
==20005==         suppressed: 0 bytes in 0 blocks
==20005== Reachable blocks (those to which a pointer was found) are not shown.
==20005== To see them, rerun with: --leak-check=full --show-reachable=yes
==20005==
==20005== For counts of detected and suppressed errors, rerun with: -v
==20005== ERROR SUMMARY: 663 errors from 10 contexts (suppressed: 4 from 4)



For comparison, here's using nwchem 6.0:
NOTE that this version works just fine and runs to completion without error messages normally.
valgrind --leak-check=full --track-origins=yes --log-file=valgrind.log nwchem nwchem.nw



==21014== Memcheck, a memory error detector
==21014== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==21014== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21014== Command: nwchem nwchem.nw
==21014== Parent PID: 20854
==21014==
==21014== Warning: set address range perms: large range [0x3952b040, 0x13352b110) (undefined)
==21014== Syscall param write(buf) points to uninitialised byte(s)
==21014==    at 0x11673980: __write_nocancel (syscall-template.S:82)
==21014==    by 0x11618B92: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1276)
==21014==    by 0x11618809: new_do_write (fileops.c:530)
==21014==    by 0x11618B34: _IO_do_write@@GLIBC_2.2.5 (fileops.c:503)
==21014==    by 0x11619347: _IO_file_sync@@GLIBC_2.2.5 (fileops.c:905)
==21014==    by 0x1160DE19: fflush (iofflush.c:43)
==21014==    by 0x8A16D7: hdbm_file_flush (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x83C973: rtdb_seq_put (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x83ADA6: rtdb_put (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x83A13F: rtdb_put_ (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x85C72C: util_set_rtdb_state_ (util_rtdb_state.F:40)
==21014==    by 0x40636B: MAIN__ (nwchem.F:223)
==21014==  Address 0xf7c0022 is not stack'd, malloc'd or (recently) free'd
==21014==  Uninitialised value was created by a stack allocation
==21014==    at 0x83B7D0: rtdb_seq_put_info (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==
==21014== Conditional jump or move depends on uninitialised value(s)
==21014==    at 0x8A5E1F: sym_op_class_name_ (sym_op_cname.F:26)
==21014==    by 0x83FB41: sym_op_classify_ (sym_op_clsfy.F:49)
==21014==    by 0x845AE9: sym_movecs_adapt_ (sym_mo_adapt.F:77)
==21014==    by 0x719A4A: scf_movecs_sym_adapt_ (scf_sym_adap.F:70)
==21014==    by 0x731028: scf_vectors_guess_ (scf_vec_guess.F:403)
==21014==    by 0x58D4E6: dft_scf_ (dft_scf.F:526)
==21014==    by 0x58B67B: dft_main0d_ (dft_main0d.F:537)
==21014==    by 0x5818B3: nwdft_ (nwdft.F:309)
==21014==    by 0x581B24: dft_energy_ (nwdft.F:18)
==21014==    by 0x4174D7: task_energy_doit_ (task_energy.F:229)
==21014==    by 0x418AEB: task_energy_ (task_energy.F:74)
==21014==    by 0x40C646: task_ (task.F:301)
==21014==  Uninitialised value was created by a stack allocation
==21014==    at 0x83F9BD: sym_op_classify_ (sym_op_clsfy.F:32)
==21014==
==21014==
==21014== HEAP SUMMARY:
==21014==     in use at exit: 4,254,665,672 bytes in 20 blocks
==21014==   total heap usage: 10,491 allocs, 10,471 frees, 4,275,242,731 bytes allocated
==21014==
==21014== 17 bytes in 2 blocks are definitely lost in loss record 9 of 19
==21014==    at 0x103C679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21014==    by 0x11626881: strdup (strdup.c:43)
==21014==    by 0x24D33F7: pbeginf_ (in /home/me/nwchem-6.0/bin/LINUX64/nwchem)
==21014==    by 0x405F53: MAIN__ (nwchem.F:66)
==21014==    by 0x406964: main (nwchem.F:336)
==21014==
==21014== 536 (24 direct, 512 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 19
==21014==    at 0x103C679D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21014==    by 0x10BDE128: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0x10CA32F8: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0x10C9C573: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0x10BDBB47: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0)
==21014==    by 0xF7ACCCF: call_init (dl-init.c:85)
==21014==    by 0xF7ACDC6: _dl_init (dl-init.c:134)
==21014==    by 0xF79FB29: ??? (in /lib/x86_64-linux-gnu/ld-2.13.so)
==21014==    by 0x1: ???
==21014==    by 0x7FF00032A: ???
==21014==    by 0x7FF000331: ???
==21014==
==21014== LEAK SUMMARY:
==21014==    definitely lost: 41 bytes in 3 blocks
==21014==    indirectly lost: 512 bytes in 1 blocks
==21014==      possibly lost: 0 bytes in 0 blocks
==21014==    still reachable: 4,254,665,119 bytes in 16 blocks
==21014==         suppressed: 0 bytes in 0 blocks
==21014== Reachable blocks (those to which a pointer was found) are not shown.
==21014== To see them, rerun with: --leak-check=full --show-reachable=yes
==21014==
==21014== For counts of detected and suppressed errors, rerun with: -v
==21014== ERROR SUMMARY: 178 errors from 4 contexts (suppressed: 4 from 4)

74. Building nwchem 6.1 on debian testing 32 bit only


EDIT 18 May 2012: 
It's now been solved on 64 bit as well
Compiling nwchem 6.1 with internal libs on debian: http://verahill.blogspot.com.au/2012/05/compiling-nwchem-61-with-internal-libs.html
Compiling nwchem 6.1 with openblas on debian: http://verahill.blogspot.com.au/2012/05/building-nwchem-61-on-debian.html


This doesn't work with the 64 bit version of nwchem 6.1. There's a separate post on that. Nwchem 6.1 64 bit will build just fine, but will crash when run. Again, see the other post.

Building on 32 bit debian testing:



Put a hold on your mpich2 and mpich2-dev packages (see e.g. here for more details)
1. edit your /etc/apt/sources.list to allow packages from stable e.g.

deb ftp://ftp.au.debian.org/debian/ testing main contrib non-fre
deb ftp://ftp.au.debian.org/debian/ stable main contrib non-free

2. create an /etc/apt/preferences file e.g.

Package: *
Pin: release a=testing
Pin-Priority: 990
Package: *
Pin: release a=stable
Pin-Priority: -10
2. install v 1.2 explicitly
sudo apt-get update && sudo apt-get install mpich2=1.2.1.1-5 libmpich2-dev=1.2.1.1-5

3. put a hold on the packages

sudo su
echo "mpich2 hold"|dpkg --set-selections
echo "libmpich2-dev hold"|dpkg --set-selections

exit

Make sure you have the necessary packages:
sudo apt-get install build-essential gfortran fort77

I had some error messages before installing fort77. Not sure they are related.

Download the nwchem source
cd ~
wget http://www.nwchem-sw.org/images/Nwchem-6.1-2012-Feb-10.tar.gz
tar -xvf Nwchem-6.1-2012-Feb.tar.gz
cd nwchem-6.1

create buildconf.sh in ~/nwchem-6.1


export LARGE_FILES=TRUE
export TCGRSH=/usr/local/bin/ssh
export NWCHEM_TOP=/home/me/nwchem-6.1
export NWCHEM_TARGET=LINUX
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include/mpich2
export LIBMPI="-lmpich -lfmpich -lpthread"
export NWCHEM_MODULES="all"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

run
sh buildconf.sh


Building takes ages. But it works. Why it works for 32 bit and not 64 bit has me a bit confused, but it's probably a good hint to the solution.

17 January 2012

52. Network (hosts) setting for mpich2 /mpd on debian

I have a switch with three computers attached two it. The ip addresses are static and are 192.168.1.1 (beryllium), 192.168.1.101 (boron), 192.168.1.102 (tantalum)

I use the computers to run nwchem across several nodes. The default /etc/chosts settings cause problems when trying to connect different instances of mpd on different nodes.

e.g.

beryllium: /etc/hosts
127.0.0.1 localhost  beryllium
192.168.1.101 boron
192.168.1.102 tantalum

Won't work, but 
127.0.0.1 localhost
192.168.1.1  beryllium
192.168.1.101 boron
192.168.1.102 tantalum

Will.

See for example:

Using /etc/hosts:
127.0.0.1 localhost  beryllium
192.168.1.101 boron
192.168.1.102 tantalum

me@tantalum:~$ mpdtrace -l
tantalum_51108 (192.168.1.102)

me@beryllium:~$ mpd --ncpus=6 -h 192.168.1.102 -p 51108 &
[2] 26283

me@tantalum:~$ mpdtrace -l
tantalum_51108 (192.168.1.102)
beryllium_38569 (127.0.0.1)

See the ip address (127.0.0.1)? tantalum sees 127.0.0.1, which is it's localhost (i.e. also tantalum). It should point at beryllium (192.168.1.1)

Using /etc/hosts:
127.0.0.1 localhost
192.168.1.1 beryllium
192.168.1.101 boron
192.168.1.102 tantalum


me@tantalum:~$ mpdtrace -l
tantalum_58007 (192.168.1.102)

me@beryllium:~$ mpd --ncpus=6 -h 192.168.1.102 -p 58007 &
[2] 26596

me@tantalum:~$ mpdtrace -l
tantalum_58007 (192.168.1.102)
beryllium_56234 (192.168.1.1)

And now it looks better.

"But /etc/hosts keeps changing on reboot!"
Which is because Network Manager keeps fiddling with it. Look at point 6 here http://verahill.blogspot.com/2012/01/debian-testing-64-wheezy-small-fixes.html

I haven't figured out how to do this via the command line yet.

49. Gromacs -- hangs on multicore when doing normal mode analysis

Symptom:
when doing
mdrun -s nm.tpr -mtx nm.mtx -v -deffnm nm
on a system with 637 atoms you end up with:
...Finish step 636 out of 637
and it hangs there with all cores running at 100%

Reason:
For some reason the normal mode analysis of at least this particular system won't run on multiple cores.

Solution:
Use an mpi compiled version of mdrun (see previous posts on compiling _dd, _mpi and _ddmpi versions of gromacs) and force the use of ONE core.

mpd --ncpus=4 &
mpdrun -n 1 mdrun_mpi -s nm.tpr -mtx nm.mtx -v -deffm nm

works!

Confirmation
This was confimed by running it on four computers:
64 bit: a six core AMD 64 using a compiled version of gromacs. Hangs.
64 bit: a four core intel i5 using both the debian version and a compiled version of gromacs. Hangs.
64 bit: an older four core intel using a compiled version of gromacs. Hangs.
32 bit: an old single-core laptop using the debian version of gromacs. Works.

Next, three single-core virtual machines were set up -- a stable 32 bit, a testing 32 bit and a testing 64 bit machine, all with the debian version of gromacs (sudo apt-get install gromac). They all worked, as they only had a single core.





10 January 2012

45. Compiling gromacs with mpich2 ver 1.2 on debian testing

If you are using mpich2 1.2.1.1-5 -- read the ** comment. Otherwise don't worry.
** In my example I've used mpich2 ver 1.2.1.1-5 -- install mpich2 and libmpich2-dev version 1.2.1.1-5 according to http://verahill.blogspot.com/2012/01/debian-testing-64-wheezy-nwhchem.html -- do everything in between "Edit these two files.." and "exitif you want the same system as I've used. **

Start here:
This is basically a condensed and annotated version of http://www.gromacs.org/Downloads/Installation_Instructions

Have a look at
http://www.gromacs.org/Downloads
to see what file to download

Also, you may want to do
sudo apt-get install build-essential gfortran fftw3

Next, use the console:

mkdir ~/tmp
cd ~/tmp

wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz

tar -xvf gromacs-4.5.5.tar.gz

aptitude search fftw
I have libfftw3-3 so I'll pull libfftw3-dev
sudo apt-get install libfftw3-dev

cd gromacs-4.5.5/

create buildconf.sh and put the following in it to build four different versions of gromacs L
(without mpi and single precision, with mpi and single precision (_mpi), without mpi and double precision (_dd), with mpi and double precision (_ddmpi)).
Change N in make -jN to equal the number of cores+1, in my case six cores => N=7, so -j7.
##########################
./configure --with-fft=fftw3
make -j7
sudo make install
 ./configure --with-fft=fftw3 --enable-mpi --program-suffix=_mpi 
make -j7 mdrun
sudo make install

make distclean

./configure --with-fft=fftw3 --disable-float --program-suffix=_dd
make -j7
sudo make install
 ./configure --with-fft=fftw3 --enable-mpi --disable-float --program-suffix=_ddmpi 
make -j7 mdrun
sudo make install
##########################

Then run
sh buildconf.sh

Next, in ~/.bashrc put

PATH=$PATH:/usr/local/gromacs/bin

or, to install for everyone, put the above line in /etc/profile (and then do source /etc/profile)

Then run
source ~/.bashrc

You can download a test set from http://www.gromacs.org/Downloads/Test-Set
Currently the newest one is ftp://ftp.gromacs.org/pub/tests/gmxtest-4.0.4.tgz

Or using git, if you have that installed:
git clone git://git.gromacs.org/regressiontests.git

09 January 2012

43. nwchem revisited. Install on new debian machine

Here's a streamlined version of compiling and setting up nwchem with mpich2 support on a virgin debian testing (wheezy) 64 bit computer. I'm working on a build guide for nwchem 6.1 -- currently it builds fine but all jobs end with a Segmentation Violation error and exits with status 11.

Start by running
sudo apt-get install build-essential  gfortran
Edit these two files (the preferences one will most likely not exist)
/etc/apt/sources.list

deb ftp://ftp.au.debian.org/debian/ testing main contrib non-free
deb ftp://ftp.au.debian.org/debian/ stable main contrib non-free
deb ftp://ftp.au.debian.org/debian/ unstable main contrib non-free

/etc/apt/preferences

 Package: *
Pin: release a=testing
Pin-Priority: 990

Package: *
Pin: release a=unstable
Pin-Priority: -10

Package: *
Pin: release a=stable
Pin-Priority: 10

IMPORTANT: the pin-priority for stable must be positive (here +10), or it won't work.

Run
sudo apt-get install mpich2=1.2.1.1-5 libmpich2-dev=1.2.1.1-5

Set the Pin-priority to -10 for stable again.

sudo su
echo "mpich2 hold"|dpkg --set-selections
echo "libmpich2-dev hold"|dpkg --set-selections
mkdir ~/nwchem
cd ~/nwchem
touch buildconf.sh
chmod +x buildconf.sh

(EDIT 21/02/2012: I accidentally put a bad csh-formatted buildconf.sh file at the beginning. Then I put an incomplete bash version. It should work now.)

In buildconf.sh put
export LARGE_FILES=TRUE
export TCGRSH=/usr/local/bin/ssh
export NWCHEM_TOP=/home/myhome/nwchem/nwchem-6.0
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr
export MPI_INCLUDE=$MPI_LOC/include/mpich2

cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

Then download the source code for nwchem

wget http://www.nwchem-sw.org/images/Nwchem-6.0.tar.gz
tar -xvf Nwchem-6.0.tar.gz

To start building:
./buildconf.sh

Once it's built:
echo "PATH=$PATH:/home/myname/nwchem/nwchem-6.0/bin/LINUX64" >> ~/.bashrc
source ~/.bashrc

Prepare mpd
echo "MPD_SECRETWORD=jibberjabber" >> ~/.mpd.conf
chmod 600 ~/.mpd.conf
mpd --ncpus=3 &

Prepare for a test-run
touch nwchem.nw
Put the following in the nwchem.nw file:

start benzene 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end
basis
 H library sto-3g
 c library sto-3g
end
dft
    xc b3lyp
end
task dft optimize

Launch the job:
mpdrun -n 2 nwchem nwchem.nw

And you should be ready to go


Edit: 12/02/2012 It looks like version of nwchem currently in SID is built with mpi support: http://packages.debian.org/sid/nwchem . I haven't checked it out.

16 December 2011

30. "Bench-marking" nwchem with mpich2 on debian wheezy

For various reasons my beowulf has been dismantled and in boxes for most of the year, with only the six-core node seeing use a normal work computer.

Anyhow, here's a very unscientific test of the performance of my six-core (phenom II, 2.8 GHz, 8Gb RAM) running the nwchem code compiled in the previous post.

The speed-tests were performed by starting up mpd
mpd --ncpus=6&
and then executing with
time mpdrun -n x ./nwchem input.nw
where x is an integer signifying the number of cores

The nwchem.nw files I used was

nwchem.nw
start benzene 

geometry units angstroms
C  0.100  1.396  0.000
C  1.209  0.698  0.000
C  1.209 -0.698  0.000
C  0.000 -1.396  0.000
C -1.209 -0.698  0.000
C -1.209  0.698  0.000
H  0.000  2.479  0.000
H  2.147  1.240  0.000
H  2.147 -1.240  0.000
H  0.000 -2.479  0.000
H -2.147 -1.240  0.000
H -2.147  1.240  0.000
end
basis
 H library sto-3g
 c library sto-3g
end
dft
    xc b3lyp
end
task dft optimize

Here are the results:


(x is number of cores; times in seconds)
x   Run 1   Run 2    Run 3   Run 4   Run 5
1* 40.8     37.9     40.7     40.3      39.9
1   22.2     40.7      40.6    44.8      38.2
2   22.8     22.4     16.3     23.5       21.5
3   14.1     12.3     15.7     15.5       15.1
4   14.5     11.5     12.0     14.9       14.7
5   11.4     11.5     8.9       11.9       12.5
6   16.0     12.2     13.4     9.9        9.6

* No mpd running; executed using time nwchem nwchem.nw



So here's the unscientific part -- the computer is running a full desktop environment with evolution, chrome etc open in the background so that each run sees a slightly different system. I've tried to vary the order in which the runs were made though.

 A guess would be that a longer run would yield more reproducible results. As it is now, the length of the runs vary significantly. The only lesson that can be obtained is that it doesn't help much throwing more cores at a problem as the optimisation times only drop off slowly past a certain point.

Edit: I've run the same file using an almost identical set-up on two more boxes
Don't compare the benchmarks when running at maximum numbers of cpu, since this will be heavily affected by other processes.

Optiplex 990 (Intel i5 2400, 4 cores @ 3.1 GHz, 8 Gb RAM)

x   Run 1   Run 2    Run 3   Run 4   Run 5
1   45.80   46.97  46.56   46.95   39.01
2   22.77  25.81   26.93  26.61   25.81
3   17.18  16.48   18.89  19.26  19.18
4   11.62  16.62   15.82  15.86   16.03

Homebuilt (3 core AMD Athlon 2 X3 @ 3.1 GHz, 4 Gb RAM)

x   Run 1   Run 2    Run 3   Run 4   Run 5   Run 6   Run 7
1   43.74   57.02   40.22   47.89   53.87
2   31.41   22.31   25.83   32.31   33.00
3   36.19   31.01   43.55   24.75   37.82   33.95   27.06


28 January 2011

3. Compiling nwchem on Ubuntu 10.10 64 bit

** See post on 15/12/2011 for information about Debian 64 bit. It builds fine on Squeeze but not Wheezy. This seems to have to do with the version of mpich2.**

Figuring out how to compile nwchem with mpich support took a little while, but this seems to have worked:

First mpich2 and gfortran need to be installed (since it was not installed on a virgin system there may have been other required packages already installed)
sudo apt-get install mpich2 gfortran

I created a file called myconfig.sh in the nwchem directory, with the following content:

setenv LARGE_FILES TRUE
setenv TCGRSH /usr/local/bin/ssh
setenv NWCHEM_TOP /work/nwchem
setenv NWCHEM_TARGET LINUX64
setenv NWCHEM_MODULES all
setenv USE_MPI y
setenv USE_MPIF y
setenv MPI_LOC /usr
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include/mpich2
setenv LIBMPI "-lfmpich -lmpich"
cd $NWCHEM_TOP/src
make nwchem_config
make FC=gfortran >& make.log


do a csh myconfig.sh and you should be good to go.

I then added the following to the end of my ~/.bashrc and sourced it:

PATH=$PATH://work/nwchem/bin/LINUX64
export NWCHEM_EXECUTABLE=/work/nwchem/bin/LINUX64/nwchem


Jobs can then be submitted (assuming that mpd is up) by
mpdrun -n 2 nwchem nameofjob.nw