Showing posts with label gromacs. Show all posts
Showing posts with label gromacs. Show all posts

23 October 2013

524. Generating bonds, angles and dihedrals for a molecule for GROMACS

Generating bond, angle and dihedral parameters for GROMACS molecular dynamics simulations is a real PITA when it comes to reasonably large and complex molecules.

Since I am currently looking at the solvation dynamics of a series of isomers of a metal oxide cluster I quickly got tired of working out the bonds and bond constraints for my 101 atom clusters by hand.

So here's a python (2.7.x) script that generates parameters suitable for a GROMACS .top or .itp file.

Let's call the script genparam. You can call it as follows:
genparam molecule.xyz 0.21 3

where molecule.xyz is the molecule in xyz coordinates. 0.21 (nm) is used to generate bonds -- atoms that are 0.21 nm or closer to each other are bonded. The final number should be 1, 2 or 3. If it is 1, then only bonds (and constraints for bonds that are 0.098 nm or shorter -- such as O-H bonds. It's specific for my use, so you can edit the code.) are printed. If it is 2, then bonds and angles are printed. If it is 3, then bonds, angles and dihedrals are printed.

A simple and understandable example using methanol looks like this:
./genparams.py methanol.xyz 0.15 3
[bonds] [bonds] 1 2 6 0.143 50000.00 ;C OH 1 3 6 0.108 50000.00 ;C H 1 4 6 0.108 50000.00 ;C H 1 5 6 0.108 50000.00 ;C H [constraints] 2 6 2 0.096 ;OH HO [ angles ] 2 1 3 1 109.487 50000.00 ;OH C H 2 1 4 1 109.487 50000.00 ;OH C H 2 1 5 1 109.466 50000.00 ;OH C H 1 2 6 1 109.578 50000.00 ;C OH HO 3 1 4 1 109.542 50000.00 ;H C H 3 1 5 1 109.534 50000.00 ;H C H 4 1 5 1 109.534 50000.00 ;H C H [ dihedrals ] 6 2 1 3 1 60.01 50000.00 2 ;HO OH C H 6 2 1 4 1 60.01 50000.00 2 ;HO OH C H 6 2 1 5 1 180.00 50000.00 2 ;HO OH C H

where methanol.xyz looks like this:
6 Methanol C -0.000000010000 0.138569980000 0.355570700000 OH -0.000000010000 0.187935770000 -1.074466460000 H 0.882876920000 -0.383123830000 0.697839450000 H -0.882876940000 -0.383123830000 0.697839450000 H -0.000000010000 1.145042790000 0.750208830000 HO -0.000000010000 -0.705300580000 -1.426986340000

Note that the bond length value may need to be tuned for each molecule -- e.g. 0.21 nm is fine for my metal oxides which have long M-O bonds, while 0.15 nm is better for my methanol.xyz. In the end, you'll probably have to do some manual editing to remove excess bonds, angles and dihedrals.

Also note that there's a switch (prevent_hhbonds) which prevent the formation of bonds between atoms that start with H -- this is to prevent bonds between e.g. H in CH3 units (177 pm). You can change this in the code in the __main__ section.

Finally, note that the function types, multiplicities and forces may need to be edited. I suggest not doing that in the code, but in the output as it will depend on each molecule.

This script won't do science for you, but it'll take care of some of the drudgery of providing geometric descriptors.

Anyway, having said all that, here's the code:


#!/usr/bin/python
import sys
from math import sqrt, pi
from itertools import permutations
from math import acos,atan2

# see table 5.5 (p 132, v 4.6.3) in the gromacs manual
# for the different function types


#from
#http://stackoverflow.com/questions/1984799/cross-product-of-2-different-\
#vectors-in-python
def crossproduct(a, b):
 c = [a[1]*b[2] - a[2]*b[1],
  a[2]*b[0] - a[0]*b[2],
  a[0]*b[1] - a[1]*b[0]]
 return c
#end of code snippet

# mostly from 
# http://www.emoticode.net/python/calculate-angle-between-two-vectors.html
def dotproduct(a,b):
 return sum([a[i]*b[i] for i in range(len(a))])

def veclength(a):
 length=sqrt(a[0]**2+a[1]**2+a[2]**2)
 return length

def ange(a,b,la,lb,angle_unit):
 dp=dotproduct(a,b)
 costheta=dp/(la*lb)
 angle=acos(costheta)
 if angle_unit=='deg':
  angle=angle*180/pi
 return angle
# end of code snippet

def diheder(a,b,c,angle_unit):
 dihedral=atan2(veclength(crossproduct(crossproduct(a,b),\
 crossproduct(b,c))), dotproduct(crossproduct(a,b),crossproduct(b,c)))
 if angle_unit=='deg':
  dihedral=dihedral*180/pi
 return dihedral

def readatoms(infile):
 positions=[]
 f=open(infile,'r')
 atomno=-2
 for line in f:
  atomno+=1
  if atomno >=1:
   position=filter(None,line.rstrip('\n').split(' '))
   if len(position)>3:
    positions+=[[position[0],int(atomno),\
    float(position[1]),float(position[2]),\
    float(position[3])]]
 return positions

def makebonds(positions,rcutoff,prevent_hhbonds):
 
 bonds=[]
 
 for firstatom in positions:
  for secondatom in positions:
   distance=round(sqrt((firstatom[2]-secondatom[2])**2\
+(firstatom[3]-secondatom[3])**2+(firstatom[4]-secondatom[4])**2)/10.0,3)
   xyz=[[firstatom[2],firstatom[3],firstatom[4]],\
[secondatom[2],secondatom[3],secondatom[4]]]
   
   if distance<=rcutoff and firstatom[1]!=secondatom[1]:
    if prevent_hhbonds and (firstatom[0][0:1]=='H' and\
     secondatom[0][0:1]=='H'):
      pass
    elif firstatom[1]<secondatom[1]:
     bonds+=[[firstatom[1],secondatom[1],\
distance,firstatom[0],secondatom[0],xyz[0],xyz[1]]]
    else:
     bonds+=[[secondatom[1],firstatom[1],\
distance,firstatom[0],secondatom[0],xyz[1],xyz[0]]]
 return bonds

def dedupe_bonds(bonds):
 
 newbonds=[]
 for olditem in bonds:
  dupe=False
  for newitem in newbonds:
   if newitem[0]==olditem[0] and newitem[1]==olditem[1]:
    dupe=True
    break;
  if dupe==False:
   newbonds+=[olditem]
 return(newbonds)

def genvec(a,b):
 vec=[b[0]-a[0],b[1]-a[1],b[2]-a[2]]
 return vec

def findangles(bonds,angle_unit):
 # for atoms 1,2,3 we can have the following situations
 # 1-2, 1-3
 # 1-2, 2-3
 # 1-3, 2-3
 # The indices are sorted so that the lower number is always first
 
 angles=[]
 for firstbond in bonds:
    
  for secondbond in bonds:
   if firstbond[0]==secondbond[0] and not (firstbond[1]==\
secondbond[1]): # 1-2, 1-3
    vec=[genvec(firstbond[6],firstbond[5])]
    vec+=[genvec(secondbond[6],secondbond[5])]
    angle=ange(vec[0],vec[1],firstbond[2]*10,\
secondbond[2]*10,angle_unit)
    angles+=[[firstbond[1],firstbond[0],\
secondbond[1],angle,firstbond[4],firstbond[3],secondbond[4],firstbond[6],\
firstbond[5],secondbond[6]]]
   
   if firstbond[0]==secondbond[1] and not (firstbond[1]==\
secondbond[1]): # 1-2, 3-1
    #this should never be relevant since we've sorted the atom numbers
    pass
   
   if firstbond[1]==secondbond[0] and not (firstbond[0]==\
secondbond[1]): # 1-2, 2-3
    vec=[genvec(firstbond[5],firstbond[6])]
    vec+=[genvec(secondbond[6],secondbond[5])]
    angle=ange(vec[0],vec[1],firstbond[2]*10,\
secondbond[2]*10,angle_unit)
    angles+=[[firstbond[0],firstbond[1],\
secondbond[1],angle,firstbond[3],firstbond[4],secondbond[4],firstbond[5],\
firstbond[6],secondbond[6]]]
   
   if firstbond[1]==secondbond[1] and not (firstbond[0]==\
secondbond[0]): # 1-3, 2-3
    vec=[genvec(firstbond[6],firstbond[5])]
    vec+=[genvec(secondbond[6],secondbond[5])]
    angle=ange(vec[0],vec[1],firstbond[2]*10,\
secondbond[2]*10,angle_unit)
    angles+=[[firstbond[0],firstbond[1],\
secondbond[0],angle,firstbond[3],firstbond[4],secondbond[3],firstbond[5],\
firstbond[6],secondbond[5]]]    
 return angles

def dedupe_angles(angles):
 dupe=False
 newangles=[]
 for item in angles:
  dupe=False
  for anotheritem in newangles:
   if item[0]==anotheritem[2] and (item[2]==anotheritem[0]\
 and item[1]==anotheritem[1]):
    dupe=True
    break
  if dupe==False:
   newangles+=[item]
 
 newerangles=[]
 dupe=False
 
 for item in newangles:
  dupe=False
  for anotheritem in newerangles:
   if item[2]==anotheritem[2] and (item[0]==anotheritem[1]\
 and item[1]==anotheritem[0]):
    dupe_item=anotheritem
    dupe=True
    break
  
  if dupe==False:
   newerangles+=[item]
  elif dupe==True:
   if dupe_item[3]>item[3]:
    pass;
   else:
    newerangles[len(newerangles)-1]=item

 newestangles=[]
 dupe=False
 
 for item in newerangles:
  dupe=False
  for anotheritem in newestangles:
   if (sorted(item[0:3]) == sorted (anotheritem[0:3])):
    dupe_item=anotheritem
    dupe=True
    break
  
  if dupe==False:
   newestangles+=[item]
  elif dupe==True:
   if dupe_item[3]>item[3]:
    pass;
   else:
    newestangles[len(newestangles)-1]=item
 return newestangles

def finddihedrals(angles,bonds,angle_unit):
 dihedrals=[]
 for item in angles:
  for anotheritem in bonds:
   if item[2]==anotheritem[0]:
    vec=[genvec(item[7],item[8])]
    vec+=[genvec(item[8],item[9])]
    vec+=[genvec(item[9],anotheritem[6])]
    dihedral=diheder(vec[0],vec[1],vec[2],\
angle_unit)
    dihedrals+=[ [ item[0],item[1],item[2],\
anotheritem[1], dihedral, item[4],item[5],item[6], anotheritem[4] ] ]

   if item[0]==anotheritem[0] and not item[1]==anotheritem[1]:
    vec=[genvec(anotheritem[6],item[7])]
    vec+=[genvec(item[7],item[8])]
    vec+=[genvec(item[8],item[9])]
    dihedral=diheder(vec[0],vec[1],vec[2],\
angle_unit)
    dihedrals+=[ [ anotheritem[1],item[0],item[1],\
item[2], dihedral, anotheritem[4],item[4],item[5], item[6] ] ]
 return dihedrals

def dedup_dihedrals(dihedrals):
 newdihedrals=[]
 
 for item in dihedrals:
  dupe=False
  for anotheritem in newdihedrals:
   rev=anotheritem[0:4]
   rev.reverse()
   if item[0:4] == rev:
    dupe=True
  if not dupe:
   newdihedrals+=[item]
 return newdihedrals

def print_bonds(bonds,func):
 constraints=""
 funcconstr='2'
 print '[bonds]'
 
 for item in bonds:
  if item[2]<=0.098:
   constraints+=(str(item[0])+'\t'+str(item[1])+'\t'+\
funcconstr+'\t'+str("%1.3f"% item[2])+'\t;'+str(item[3])+'\t'+str(item[4])+'\n')
  else: 
   print str(item[0])+'\t'+str(item[1])+'\t'+func+'\t'+\
str("%1.3f"% item[2])+'\t'+'50000.00'+'\t;'+str(item[3])+'\t'+str(item[4])
 print '[constraints]'
 print constraints
 return 0

def print_angles(angles,func):
 print '[ angles ]'
 for item in angles:
  print str(item[0])+'\t'+str(item[1])+'\t'+str(item[2])+'\t'+\
func+'\t'+str("%3.3f" % item[3])+'\t'+'50000.00'+'\t;'\
  +str(item[4])+'\t'+str(item[5])+'\t'+str(item[6])
 return 0

def print_dihedrals(dihedrals,func):
 print "[ dihedrals ]"
 force='50000.00'
 mult='2'
 for item in dihedrals:
  print str(item[0])+'\t'+str(item[1])+'\t'+str(item[2])+'\t'+\
str(item[3])+\
  '\t'+func+'\t'+str("%3.2f" % item[4])+'\t'+force+'\t'+mult+\
  '\t;'+str(item[5])+'\t'+str(item[6])+\
  '\t'+str(item[7])+'\t'+str(item[8])
 return 0

if __name__ == "__main__":
 infile=sys.argv[1]
 rcutoff=float(sys.argv[2]) # in nm
 itemstoprint=int(sys.argv[3])
 
 angle_unit='deg' #{'rad'|'deg'}
 prevent_hhbonds=True # False is safer -- it prevents bonds between
 # atoms whose names start with H

 positions=readatoms(infile)

 bonds=makebonds(positions,rcutoff,prevent_hhbonds)
 bonds=dedupe_bonds(bonds) 
 
 print_bonds(bonds,'6')
 
 angles=findangles(bonds,angle_unit)
 angles=dedupe_angles(angles)
 
 if itemstoprint>=2:
  print_angles(angles,'1')

 dihedrals=finddihedrals(angles,bonds,angle_unit)
 dihedrals=dedup_dihedrals(dihedrals)
 if itemstoprint>=3:
  print_dihedrals(dihedrals,'1') 

26 April 2013

396. Compiling gromacs 4.6 with gpu support, openblas and fftw3 on debian wheezy

NOTE: with ACML my performance on my FX8150 and FX8350 nodes is only 25% of that with Openblas (double precision). Yes, for some reason gromacs is four times faster with openblas than with the machine vendor libraries in my tests.

Here are the release notes: http://www.gromacs.org/About_Gromacs/Release_Notes/Versions_4.6.x
As far as I understand you don't have to rely on openmm anymore for CUDA. Yes, the PITA of compiling openmm is gone!

Note that GPU calcs only speed things up under certain, specific conditions  -- and not all nvidia cards are supported (or equal). My own set-up, using statically cooled graphics cards, is definitely not appropriate for a GPU cluster. Once nwchem comes out with GPU support I might upgrade to fancier $200 graphics cards (maybe COSMO in NWChem will finally become more reasonable in terms of computational cost), but there's little reason for that at the moment.

Not all cards are created equal either -- e.g. GT210, which has GPU compute capability 1.2, is too poor to run with gromacs. GT430 (compute cap GT430) works. Both are obviously not viable for professional work.

Also note that it seems that you still need to use OPENMM if you want GPU support for implicit solvation.

Gromacs used to be easy to install. It's become a fair bit more complicated between 4.5.5 and 4.6. See here for gromacs 4.5.5: http://verahill.blogspot.com.au/2012/05/gromacs-with-external-fftw3-and-blas-on.html

CUDA: If you want to build with cuda you need gcc-4.6, which is still available in the wheezy repos. 4.7 won't work. Luckily, you can have both on your system, but you'll need to specify CC and CXX as shown below.

Openblas
Note that the links to the openblas file tends to die after a while, so you might have to download it manually.

sudo mkdir /opt/openblas
sudo chown ${USER} /opt/openblas
cd ~/tmp
wget http://github.com/xianyi/OpenBLAS/tarball/v0.2.6
tar xvf v0.2.6
cd xianyi-OpenBLAS-87b4d0c/
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
make all BINARY=64 CC=/usr/bin/gcc FC=/usr/bin/gfortran USE_THREAD=0 INTERFACE64=1 1> make.log 2>make.err
make PREFIX=/opt/openblas install
cp lib*.*  /opt/openblas/lib

add
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openblas/lib
to your ~/.bashrc [for later use with nwchem and ecce, add /opt/openblas/lib to /etc/ld.so.conf and do sudo ldconfig -- you might want to make libopenblas.so and libopenblas.so.0 sym links to the main lib, libopenblas_bulldozer-r0.2.6.so]

single-precision gromacs 4.6 with both CPU and GPU

CUDA
If you have an nvidia card and want to enable GPU calcs, do
sudo apt-get install nvidia-cuda-toolkit gcc-4.6 g++-4.6

If /usr/lib/libcuda.so is nothing by a symmlink to /usr/lib/libcuda.so.1, and the file /usr/lib/libcuda.so.1 is missing (this was the case on my wheezy amd64), then do
sudo rm /usr/lib/libcuda.so
sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/libcuda.so

You can also simply make sure that there/s no /usr/lib/libcuda.so.

Continue with the gromacs compilation:
cd ~/tmp
sudo apt-get install cmake
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.6.tar.gz
tar xvf gromacs-4.6.tar.gz
mkdir build_gromacs46
cd build_gromacs46
sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
export LDFLAGS="-L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/openblas/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_single -DGMX_EXTERNAL_BLAS=/opt/openblas/lib ../gromacs-4.6
make
make install

Note: for acml I used this instead:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib
export LDFLAGS="-L/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib -lacml"
export CPPFLAGS="-I/opt/acml/acml5.2.0/gfortran64_fma4_int64/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_single -DGMX_EXTERNAL_BLAS=/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib ../gromacs-4.6


Double-precision gromacs without GPU acceleration:

cd ~/tmp/build_gromacs46
rm * -rf
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
export LDFLAGS="-L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/openblas/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=on -DGMX_GPU=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_double -DGMX_EXTERNAL_BLAS=/opt/openblas/lib ../gromacs-4.6
make
make install

Note: for acml I used this instead:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib
export LDFLAGS="-L/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib -lacml"
export CPPFLAGS="-I/opt/acml/acml5.2.0/gfortran64_fma4_int64/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=on -DGMX_GPU=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_double -DGMX_EXTERNAL_BLAS=/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib ../gromacs-4.6

Add gromacs to path:
echo 'export PATH=$PATH:/opt/gromacs/gromacs4.6_single/bin:/opt/gromacs/gromacs4.6_double/bin' >> ~/.bashrc

Switching between GPU and CPU
You can use the same binary for both, but remember that only the single precision binaries have GPU support to begin with. To set gpu vs cpu, use the -nb option in mdrun:
-nb enum auto Calculate non-bonded interactions on: auto, cpu, gpu or gpu_cpu


Quick test:
cd ~/tmp
wget http://www.gromacs.org/@api/deki/files/128/=gromacs-gpubench-dhfr.tar.gz
tar xvf \=gromacs-gpubench-dhfr.tar.gz
cd dhfr/GPU/dhfr-solv-PME.bench
mdrun -nb cpu -s topol.tpr -testverlet

Hit ctrl+c to stop and get statistics. Then try
mdrun -nb gpu -s topol.tpr -testverlet

I got
XPU ns/day -------------- auto 7.4 GPU 7.7 CPU 4.1 gpu_cpu 7.5

where I have a 3 core 3.1 GHz AMD Athlon II X3 445 CPU and an NVIDIA GeForce GT 430 graphics card -- neither of which is anything special.

Note also that the ns/day values depended highly on how long I let the calc run, and as I didn't time it and make them run the same amount of time, I suspect that auto, GPU and gpu_cpu are all about the same.

20 September 2012

243. My own personal benchmarks for NWChem, gromacs with atlas, openblas, acml on AMD and intel

Update: you can compile against acml on intel as well, and against mkl on amd. Still need to do some performance testing to see how well it works. The artificial penalty of running mkl on AMD is well-publicised and led to a lawsuit, but I don't know how acml performs on mkl.


The title says it all, really. Since I'm back to exploring ways of improving performance for my little cluster I figured I'd break this out as a separate post. Most of this data was found here before: http://verahill.blogspot.com.au/2012/09/new-compute-node-using-amd-fx-8150.html

All units are running up-to-date debian testing (wheezy).

Configuration:
Boron (B): Phenom II X6 2.8 GHz, 8Gb RAM (2.8*6=16.8 GFLOPS predicted)
Neon (Ne): FX-8150 X8 3.6 GHz, 16 Gb RAM (3.6*8=28.8 GFLOPS predicted (int), 3.6*4=14.4 GFLOPS (fpu))
Tantalum (Ta): Quadcore i5-2400 3.1 GHz, 8 Gb RAM (3.1*4=12.4 GFLOPS predicted)
Vanadium (V):  Dual socket 2x Quadcore Xeon X3480 3.06 GHz, 8Gb. CentOS (ROCKS 5.4.3)/openblas.

Results

Gromacs --double (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, double precision, 500k steps)
B  :  10.662 ns/day (11.8  GFLOPS, runtime 8104 seconds)***
B  :    9.921 ns/day ( 10.9 GFLOPS, runtime 8709 seconds)**
Ne:  10.606 ns/day (11.7  GFLOPS, runtime 8146 seconds) *
Ne:  12.375 ns/day (13.7  GFLOPS, runtime 6982 seconds)**
Ne:  12.385 ns/day (13.7  GFLOPS, runtime 6976 seconds)****
Ta:  10.825 ns/day (11.9  GFLOPS, runtime 7981 seconds)***
V :   10.560 ns/dat (11.7  GFLOPS, runtime 8182 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

Gromacs --single (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, single precision, 500 k steps)
B  :   17.251 ns/day (19.0 GFLOPS, runtime 5008 seconds)***
Ne:   21.874 ns/day (24.2 GFLOPS, runtime  3950 seconds)**
Ne:   21.804 ns/day (24.1 GFLOPS, runtime 3963  seconds)****
Ta:   17.345 ns/day (19.2 GFLOPS, runtime  4982 seconds)***
V :   17.297 ns/day (19.1 GFLOPS, runtime 4995 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

NWChem (opt biphenyl cation, cp-md/pspw):
B  :   5951 seconds**
B  :   4084 seconds ***
B  :   5782 seconds ***xy
Ne:    3689 seconds**
Ta :   4102 seconds***
Ta :   4230 seconds***xy
V :    5396 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
B  :  2841 seconds **
B  :  2410 seconds***
B  :  2101 seconds ***x
B  :  2196 seconds ***xy
Ne: 1665 seconds **
Ta : 1785 seconds***
Ta : 1789 seconds***xy
V  : 2600 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem
y NWChem 6.1.1

A Certain Commercial Ab Initio Package (Freq calc of pre-optimised H14C19O3 at 6-31+G*/rb3lyp):
B  :    2h 00 min (CPU time 10h 37 min)
Ne:   1h 37 min (CPU time: 11h 13 min)
Ta:   1h 26 min (CPU time: 5h 27 min)
V  :   2h 15 min (CPU time 15h 50 min)
Using precompiled binaries.


Gamess:
(I'm still working on learning how to run gamess efficiently, so take these values with a huge saucer of salt for now). bn.inp does a geometry optimisation of a biphenyl cation (mult 2) at ub3lyp/6-31G**. bn.inp has no $STATPT card while bn3.inp does and it makes a huge difference -- but is this because it does 20 steps (nsteps=20), then kills the run? The default is 50 steps and it does seem like all the runs do the maximum number of steps, then exit.

 Again, still learning. See below for input files. Will fix this post as I learn what the heck I'm doing. The relative run times on each node are still comparable though, but just don't use the numbers to compare the run speed of e.g. nwchem vs gamess.

Gamess using bn.inp with atlas
B:    9079 seconds
Ne: 7252 seconds
Ta:  9283 seconds

Gamess using bn.inp with openblas
B:   9071 seconds
Ta: 9297 seconds

Gamess using bn.inp with acml
Ne: 7062 seconds

Gamess using bn3.inp with atlas. 
B: 4016 seconds
Ne: 3162 seconds
Ta: 4114 seconds

MPQC:
Here I've used the version in the debian repos. I've created a hostfile
neon slots=8 max_slots=8
tantalum slots=4 max_slots=4
boron slots=6 max_slots=6

and then just looked at changing the order and slots assignment as well as total number of cores assigned using mpirun.

Simple test case looking at number of cores/distribution:
n cores:  Seconds: Configuration(cores,exec nodes)
4    :   11   : 4(Ta)
4    :   17   : 4(Ne)
4    :   17   : 4(B)
4    :   42   : 2(Ta)+2(B)
6    :   12   : 6(B)
6    :   13   : 6(Ne)
6    :   74   : 2(Ta)+2(B)+2(Ne)
8    :   12   : 8(Ne)
10  :   43   : 4(Ta)+6(B)
12  :   47   : 4(Ta)+8(Ne)
14  :   55   : 6(B)+8(Ne)
18  :   170 :  4(Ta)+6(B)+8(Ne)

My beowulf cluster doesn't seem to be much of a super computer. All in all, this looks like a pretty good argument in favour of upgrading to infiniband...


bn.inp:
 $CONTRL 
COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize 
ICHARG=1 MULT=2 maxit=100
$END
 $system mwords=2000 $end
 $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END
 $guess guess=huckel $end

 $DATA
biphenyl
C1
C      6.0      0.0000000000   -3.5630100000    0.0000000000 
C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 
C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 
C      6.0      0.0000000000   -0.7428100000    0.0000000000 
C      6.0      1.1387900000   -1.4654500000    0.3941500000 
C      6.0      1.1392700000   -2.8592800000    0.3938400000 
C      6.0      0.0000000000    0.7428100000    0.0000000000 
C      6.0      1.1387900000    1.4654500000   -0.3941500000 
C      6.0      1.1392700000    2.8592800000   -0.3938400000 
C      6.0     -1.1387900000    1.4654500000    0.3941500000 
C      6.0      0.0000000000    3.5630100000    0.0000000000 
C      6.0     -1.1392700000    2.8592800000    0.3938400000 
H      1.0      0.0000000000   -4.6489600000    0.0000000000 
H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 
H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 
H      1.0      2.0282700000   -3.3966200000    0.7116100000 
H      1.0      2.0282700000    3.3966200000   -0.7116100000 
H      1.0     -2.0214800000    0.9282700000    0.7279300000 
H      1.0      0.0000000000    4.6489600000    0.0000000000 
H      1.0     -2.0282700000    3.3966200000    0.7116100000 
H      1.0      2.0214800000    0.9282700000   -0.7279300000 
H      1.0      2.0214800000   -0.9282700000    0.7279300000 
 $END


bn3.inp:
$CONTRL 
COORD=CART UNITS=ANGS scftyp=uhf dfttyp=b3lyp runtyp=optimize 
ICHARG=1 MULT=2 maxit=100
$END
 $system mwords=2000 $end
 $BASIS gbasis=n31 ngauss=6 ndfunc=1 npfunc=1 $END
 $STATPT OPTTOL=0.0001 NSTEP=20 HSSEND=.TRUE. $END
 $guess guess=huckel $end

 $DATA
biphenyl
C1
C      6.0      0.0000000000   -3.5630100000    0.0000000000 
C      6.0     -1.1392700000   -2.8592800000   -0.3938400000 
C      6.0     -1.1387900000   -1.4654500000   -0.3941500000 
C      6.0      0.0000000000   -0.7428100000    0.0000000000 
C      6.0      1.1387900000   -1.4654500000    0.3941500000 
C      6.0      1.1392700000   -2.8592800000    0.3938400000 
C      6.0      0.0000000000    0.7428100000    0.0000000000 
C      6.0      1.1387900000    1.4654500000   -0.3941500000 
C      6.0      1.1392700000    2.8592800000   -0.3938400000 
C      6.0     -1.1387900000    1.4654500000    0.3941500000 
C      6.0      0.0000000000    3.5630100000    0.0000000000 
C      6.0     -1.1392700000    2.8592800000    0.3938400000 
H      1.0      0.0000000000   -4.6489600000    0.0000000000 
H      1.0     -2.0282700000   -3.3966200000   -0.7116100000 
H      1.0     -2.0214800000   -0.9282700000   -0.7279300000 
H      1.0      2.0282700000   -3.3966200000    0.7116100000 
H      1.0      2.0282700000    3.3966200000   -0.7116100000 
H      1.0     -2.0214800000    0.9282700000    0.7279300000 
H      1.0      0.0000000000    4.6489600000    0.0000000000 
H      1.0     -2.0282700000    3.3966200000    0.7116100000 
H      1.0      2.0214800000    0.9282700000   -0.7279300000 
H      1.0      2.0214800000   -0.9282700000    0.7279300000 
 $END

10 September 2012

230. ROCKS 5.4.3, ATLAS and Gromacs on Xeon X3480

After doing another round of 'benchmarks' (there are so many factors that differ between the systems that it's difficult to tell exactly what I'm measuring) I'm back to looking at my BLAS/LAPACK.


So here's compiling ATLAS on a cluster made up of six dual-socket mobos with 2x quadcore XeonX3480 CPUs and 8 Gb RAM. The cluster is running ROCKS 5.4.3, which is a spin based on Centos 5.6. We then compile GROMACS using ATLAS and compare it with Openblas. Please note that I am not an expert on optimisations (or computers or anything) so comparing Openblas vs ATLAS won't tell you which one is 'better'. They are just numbers based on what someone once observed on a particular system under a particular set of circumstances.

Hurdles: I first had to deal with the lapack + bad symbols + recompile with -fPIC problem (solved by using netlib lapack and building shared libraries), then encountered the 'libgmx.so: undefined reference to _gfortran_' issue (solved by adding -lgfortran to LDFLAGS).


ATLAS
sudo mkdir /share/apps/ATLAS
sudo chown $USER /share/apps/ATLAS
cd ~/tmp
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
 wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/
mkdir build
cd build
.././configure --prefix=/share/apps/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared
OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as  SSE3 (6,448)
Architecture configured as  Corei1 (25)
Clock rate configured as 3059Mhz
make
DONE  STAGE 5-1-0 at 15:23
ATLAS install complete.  Examine
ATLAS/bin/<arch>/INSTALL_LOG/SUMMARY.LOG for details.

ls lib/
libatlas.a  libcblas.a  libf77blas.a  libf77refblas.a  liblapack.a  libptcblas.a  libptf77blas.a  libptlapack.a  libsatlas.so  libtatlas.so  libtstatlas.a  Makefile  Make.inc
make install

In addition to successful copying you'll also get errors along the lines of

cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dylib /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dylib': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libsatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libtatlas.dll /share/apps/ATLAS/lib/.
cp: cannot stat `/home/me/tmp/ATLAS/build/lib/libtatlas.dll': No such file or directory
make[1]: [install_lib] Error 1 (ignored)
cp /home/me/tmp/ATLAS/build/lib/libsatlas.so /share/apps/ATLAS/lib/.
cp /home/me/tmp/ATLAS/build/lib/libtatlas.so /share/apps/ATLAS/lib/.
because those files don't exist. 


Gromacs

FFTW3 was first build according to this. The only difference is the install targets (--prefix) -- I put things in /share/apps/gromacs/.fftwsingle and /share/apps/gromacs/.fftwdouble. Gromacs was downloaded and extracted as shown in that post, and /share/apps/gromacs was created.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/share/apps/ATLAS/lib
#single precision
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spa --prefix=/share/apps/gromacs
make -j3
make install
#double precision
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpa --prefix=/share/apps/gromacs
make -j3
make install
#single + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwsingle/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwsingle/include -I/share/apps/ATLAS/include/atlas""
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spampi --prefix=/share/apps/gromacs
make -j3
make install
#double + mpi
make distclean
export LDFLAGS="-L/share/apps/gromacs/.fftwdouble/lib -L/share/apps/ATLAS/lib -latlas -llapack -lf77blas -lcblas -lgfortran"
export CPPFLAGS="-I/share/apps/gromacs/.fftwdouble/include -I/share/apps/ATLAS/include/atlas"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpampi --prefix=/share/apps/gromacs
make -j3
make install
The -lgfortran is IMPORTANT, or you'll end up with
libgmx.so: 'undefined reference to _gfortran_' type errors.

Performance
I ran a 6x6x6 nm box of water for 5 million steps (10 ns) to get a rough idea of the performance.
Make sure to put
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/ATLAS/lib
in your ~/.bashrc, and to include it in your SGE jobs files (if that's what you use).

I allocated 8 Gb RAM and 8 cores for each run.

Double precision:
Openblas: 10.560 ns/day (11.7 GFLOPS, runtime 8182 seconds)
ATLAS  : 10.544 ns/day (11.6 GFLOPS, runtime 8194 seconds)

Single precision:
Openblas: 17.297 ns/dat (19.1 GFLOPS, runtime 4995 seconds)
ATLAS:   17.351 ns/day (19.2 GFLOPS, runtime 4980 seconds)
That's 15 seconds difference on a 1h 20 min run. I'd say they are identical.

07 September 2012

229. Compile ATLAS (+ gromacs, nwchem) on AMD FX 8150 on Debian Testing (Wheezy)

Xianyi's openblas doesn't seem to be ready for AMD FX 8150 yet. I've played with ATLAS in the past, but  for some reason didn't see the same performance with NWChem and ATLAS as I saw with NWChem and Openblas, so I never ended up using it.

I'm also having issues using openblas with CPMD and quantum espresso, and ATLAS is a well-established, respectable project, so it's time to give it another shot. As in most cases in these situations, it's probably a matter of PEBKAC.

Building ATLAS
Anyway. On we go...

mkdir /opt/ATLAS
chown ${USER} /opt/ATLAS
mkdir ~/tmp
cd ~/tmp

wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
 wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/

Edit ATLAS/Make.top
change the V on line 6 to lowercase i.e. from
- $(ICC) -V 2>&1 >> bin/INSTALL_LOG/ERROR.LOGto
- $(ICC) -v 2>&1 >> bin/INSTALL_LOG/ERROR.LOG
mkdir build/
cd build/


sudo apt-get install cpufreq-utils
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand
sudo cpufreq-set -g performance

Unfortunately that only takes care of cpu0:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
but

cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
ondemand
So...since we have 8 cores (cpu0-cpu7):

sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor

OK, we're ready to compile:
.././configure --prefix=/opt/ATLAS -Fa alg '-fPIC' --with-netlib-lapack-tarfile=$HOME/tmp/lapack-3.4.1.tgz --shared

Some of the info that's important is:
OS configured as Linux (1)
Assembly configured as GAS_x8664 (2)
Vector ISA Extension configured as  AVXFMA4 (4,496)
Architecture configured as  AMDDOZER (34)
Clock rate configured as 3600Mhz
If that checks out you don't need to manually set your architecture. To get a list over options, do
 make xprint_enums ; ./xprint_enums

If all is well,

make
make install

You should now be done.

Linking Gromacs against ATLAS

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib
#single precision
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/ATLAS/include"
./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make.err 1>make.log
make install

#double precision
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran" 
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/ATLAS/include"
./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make2.err 1>make2.log
make install

#single + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/ATLAS/lib -lsatlas -ltatlas -lgfortran"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/ATLAS/include"
./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spmpiatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make3.err 1>make3.log
make install

#double + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/ATLAS/lib -lsatlas  -ltatlas  -lgfortran" 
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/ATLAS/include"
./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpmpiatlas --prefix=/opt/gromacs/gromacs-4.5.5
make -j6 2>make4.err 1>make4.log
make install

Linking NWChem against ATLAS

export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all"
export BLASOPT="-L/opt/ATLAS/lib -lsatlas -ltatlas"
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH="$LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/ATLAS/lib"
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export LDFLAGS="-I/opt/ATLAS/include"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran 2> make.err 1>make.log
export FC=gfortran
cd $NWCHEM_TOP/contrib
./getmem.nwchem

04 September 2012

227. New compute node using AMD FX-8150. Gromacs, nwchem performance/benchmarks

Update: reconfiguring your nwchem binary using getmem.nwchem can speed things up considerably. Most of the runtimes are obtained without using getmem.nwchem and are thus all using the same amount of memory, regardless of what is available. Binaries which have been reconfigured are shown as such.

The short summary: I first wasn't that happy with my choice of the the AMD FX-8150, but after sorting out the ACML libs and getting things benchmarked I'm much more satisfied. The only situation in which I'm not seeing this processor outclass the other systems seems to be that using the Commercial Ab Initio Package, which arrived as a precompiled binary (Portland Fortran).

In general it seems that the FX 8150 is about 10% faster than the i5-2400 for the computations I've tested here -- but beware that the AMD processor is using the machine vendor math libs, while the intel unit is using openblas.

Note that the AMD Phenom II X6 1055T is SLOWER with the ACML libs than with Openblas.

The Lengthy Preamble
I seem to remember promising myself not to get another AMD since, while they may or may not be 'the good guys' (e.g. the Intel/Dell thingy), empirically I keep on seeing my Quadcore Intel i5-2400 3.1 GHz sweeping the floor with my Phenom II X6 1055T 2.8 GHz. Sure, part of the issue is the clock frequency, but  the difference seems to be a lot bigger than that.

At any rate, I ended up building a new node for my little cluster. Remember that these are Australian prices. Oh, how I miss you, Newegg -- not just because of the price, but because of the choice.

Luckily it seems like my choice of the FX-8150 has paid off. Also, at the moment of writing the intel i5-2400 and the AMD fx-8150 sell for the same price locally.


The Setup

It's basically an eight-core 3.6 GHz box with 16 GB RAM (expandable to 32 Gb, 4 slots) and a 7200 rpm HDD. I've heard the eight-core FX 8150 uncharitably described as a quad-core with advanced hyper-threading, but I wouldn't be qualified to comment. Interestingly, sinfo registers it as a quad core, while htop and all other programs considers it an 8-core. Finally, looking at this image it looks like the whole 8 core thing is a bit of a cheat -- the whole 4 floating point vs 8 integer processing units.

Gigabyte GA-990FXA-D3 AM3 990FX DDR3 Motherboard AU$ 128 Link
AMD AM3+ x8 FX-8150 3.6Ghz Boxed CPU AU$209 Link
PV316G186C0K 16G Kit(8Gx2) DDR3 1866 AU$ 129 Link
Hitachi 3.5" Desktar 1TB SATA3 HDD 7200rpm AU$83 Link
Corsair GS800 V2 ATX Power Supply Unit AU$ 138 Link
TP-LINK TG-3269 PCI Gigabit PCI Network Card AU$ 8Link
ASUS Vento TA-U11 without PSU AU$99 Link
ASUS 1GB GF210 PCI-E VGA Card Link

NOTE that the mobo does NOT have onboard video. I didn't pick up on that before buying the parts, but luckily had an old ATI card floating around.

The fan on the PSU is a bit annoying. It stays off for the most part (some posts say it should never be completely off, one post said it should be) but starts up in a weird way -- basically the electricity is given in small jolts. Or it's just broken. Other than that it works fine.

Preparation
It's for reasons like these I write this blog. After having installed debian testing I set up NFS, added the box as a node under Sun Grid Engine (Link), set up Gaussian (Link), and compiled Gromacs.

I encountered separate issues trying to compile Openblas (Bulldozer cores aren't supported) and Nwchem with internal libs (odd stuff). I've given up on Openblas and managed to compile nwchem against the AMD ACML. Same went for gromacs -- I eventually recompiled gromacs against ACML. Maybe it's unfair to compare ACML vs Openblas on the i5-2400, but ACML is free, MKL isn't.


Performance -- setup
Note that while I do use NFS it's not in the 'traditional way'. Each node exports a local folder to the front node so that SGE can see it. However, when you run your calcs everything is stored in a local folder, and using a locally compiled version of the number crunching software. In other words, network performance should not affect the benchmarks.

Neon is NOT using openblas, while Boron and Tantalum are. Xianyi's version of openblas won't compile on Bulldozer at the moment (it seems). I will rebuild gromacs with the ACML libs and do the benchmark again.

Also, please note that these 'benchmarks' aren't absolute -- I'm not an expert on optimising performance. You can probably use them to get an idea of the relative computational grunt of the different hardware combinations though.

FX 8150 is a lot more fun with ACML. The Phenom II 1055T is no fun with ACML.

I recompiled nwchem and gromacs on Boron (see below) to see what ACML vs Openblas would be like. I've yet to run those jobs, but will post the results when I have.

Unlike the FX-8150, the Phenom II X6 1055T does not support AVX, FMA3 or FMA4.

Configuration:
Boron (B): Phenom II X6 2.8 GHz, 8Gb RAM (2.8*6=16.8 GFLOPS predicted)
Neon (Ne): FX-8150 X8 3.6 GHz, 16 Gb RAM (3.6*8=28.8 GFLOPS predicted (int), 3.6*4=14.4 GFLOPS (fpu))
Tantalum (Ta): Quadcore i5-2400 3.1 GHz, 8 Gb RAM (3.1*4=12.4 GFLOPS predicted)
Vanadium (V):  Dual socket 2x Quadcore Xeon X3480 3.06 GHz, 8Gb. CentOS (ROCKS 5.4.3)/openblas.

Results

Gromacs --double (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, double precision, 500k steps)
B  :  10.662 ns/day (11.8  GFLOPS, runtime 8104 seconds)***
B  :    9.921 ns/day ( 10.9 GFLOPS, runtime 8709 seconds)**
Ne:  10.606 ns/day (11.7  GFLOPS, runtime 8146 seconds) *
Ne:  12.375 ns/day (13.7  GFLOPS, runtime 6982 seconds)**
Ne:  12.385 ns/day (13.7  GFLOPS, runtime 6976 seconds)****
Ta:  10.825 ns/day (11.9  GFLOPS, runtime 7981 seconds)***
V :   10.560 ns/dat (11.7  GFLOPS, runtime 8182 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

Gromacs --single (1 ns 6x6x6 nm tip4p water box; dynamic load balancing, single precision, 500 k steps)
B  :   17.251 ns/day (19.0 GFLOPS, runtime 5008 seconds)***
Ne:   21.874 ns/day (24.2 GFLOPS, runtime  3950 seconds)**
Ne:   21.804 ns/day (24.1 GFLOPS, runtime 3963  seconds)****
Ta:   17.345 ns/day (19.2 GFLOPS, runtime  4982 seconds)***
V :   17.297 ns/day (19.1 GFLOPS, runtime 4995 seconds)***
*no external blas/lapack.
**using ACML libs
*** using openblas
**** using ATLAS

NWChem (opt biphenyl cation, cp-md/pspw):
B  :   5951 seconds**
B  :   4084 seconds ***
B  :   1988 seconds***x
Ne:   3689 seconds**
Ta :   4102 seconds***
V :    5396 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
B  :  2841 seconds **
B  :  2410 seconds***
B  :  2101 seconds ***x
Ne: 1665 seconds **
Ta : 1785 seconds***
Ta : 1789 seconds***x
V  : 2600 seconds***

*no external blas/lapack.
**using ACML libs
*** using openblas
x Reconfigured using getmem.nwchem

A Certain Commercial Ab Initio Package (Freq calc of pre-optimised H14C19O3 at 6-31+G*/rb3lyp):
B  :    2h 00 min (CPU time 10h 37 min)
Ne:   1h 37 min (CPU time: 11h 13 min)
Ta:   1h 26 min (CPU time: 5h 27 min)
V  :   2h 15 min (CPU time 15h 50 min)
Using precompiled binaries.

More:
Since I couldn't use Xianyi's openblas with FX 8150 I downloaded the AMD ACML. I've had issues with that before, which is why I haven't been using that as a rule. This time I was motivated enough to hammer it out though. Anyway, here's the cpuid output from the acml 5.2.0:
./cpuid.exe 
Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 6 model 1
Model Name: AMD FX(tm)-8150 Eight-Core Processor        
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip supports AVX
Chip does not support FMA3
Chip supports FMA4
See the other post from today about build nwchem with acml (hint: use the fma4_int64 libs but avoid mp).

Here's 1055T:
Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 1 model 10
Model Name: AMD Phenom(tm) II X6 1055T Processor
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip does not support AVX
Chip does not support FMA3
Chip does not support FMA4


Issues

Openblas:
You will get SGEMM related errors trying to build openblas according to the instructions I've posted on this site before. Apparently it has to do with the way the architecture is autoselected during build. Or something. I couldn't make it work.

NwChem:
I tried building nwchem with the internal libs, but had no luck. See other posts on this blog for general instructions. Building with the AMD ACML worked fine though.


Files:

NWChem (opt biphenyl cation, cp-md/pspw):
Title "Test 1"
Start  biphenyl_cation_twisted-1
echo
charge 1
geometry autosym units angstrom
 C     0.00000     -3.54034     0.00000
 C     -1.20296     -2.84049     -0.216000
 C     -1.20944     -1.46171     -0.206253
 C     0.00000     -0.721866     0.00000
 C     1.20944     -1.46171     0.206253
 C     1.20296     -2.84049     0.216000
 C     0.00000     0.721866     0.00000
 C     1.20944     1.46171     -0.206253
 C     1.20296     2.84049     -0.216000
 C     -1.20944     1.46171     0.206253
 C     0.00000     3.54034     0.00000
 C     -1.20296     2.84049     0.216000
 H     0.00000     -4.62590     0.00000
 H     -2.12200     -3.38761     -0.395378
 H     -2.13673     -0.938003     -0.401924
 H     2.12200     -3.38761     0.395378
 H     2.12200     3.38761     -0.395378
 H     -2.13673     0.938003     0.401924
 H     0.00000     4.62590     0.00000
 H     -2.12200     3.38761     0.395378
 H     2.13673     0.938003     -0.401924
 H     2.13673     -0.938003     0.401924
end
nwpw
  simulation_cell
     lattice_vectors
      2.000000e+01 0.000000e+00 0.000000e+00
      0.000000e+00 2.000000e+01 0.000000e+00
      0.000000e+00 0.000000e+00 2.000000e+01
  end
  mult 2
  np_dimensions -1  -1
  tolerances 1e-7  1e-7
end
driver
  default
end
task pspw optimize
NWChem (opt biphenyl cation, geovib, 6-31G**/ub3lyp):
Title "Test 2"
Start  biphenyl_cation_twisted
echo
charge 1
geometry autosym units angstrom
 C     0.00000     -3.56301     0.00000
 C     -1.13927     -2.85928     -0.393841
 C     -1.13879     -1.46545     -0.394153
 C     0.00000     -0.742814     0.00000
 C     1.13879     -1.46545     0.394153
 C     1.13927     -2.85928     0.393841
 C     0.00000     0.742814     0.00000
 C     1.13879     1.46545     -0.394153
 C     1.13927     2.85928     -0.393841
 C     -1.13879     1.46545     0.394153
 C     0.00000     3.56301     0.00000
 C     -1.13927     2.85928     0.393841
 H     0.00000     -4.64896     0.00000
 H     -2.02827     -3.39662     -0.711607
 H     -2.02148     -0.928265     -0.727933
 H     2.02827     -3.39662     0.711607
 H     2.02827     3.39662     -0.711607
 H     -2.02148     0.928265     0.727933
 H     0.00000     4.64896     0.00000
 H     -2.02827     3.39662     0.711607
 H     2.02148     0.928265     -0.727933
 H     2.02148     -0.928265     0.727933
end
basis "ao basis" cartesian print
  H library "6-31G**"
  C library "6-31G**"
END
dft
  mult 2
  XC b3lyp
  mulliken
end
driver
end
task dft optimize
task dft freq numerical

17 May 2012

155. Gromacs with external fftw3 and blas on debian testing

This is based on http://verahill.blogspot.com.au/2012/03/building-gromacs-with-fftw3-and-openmpi.html

Make sure your build environment is set up:
sudo apt-get install build-essential gfortran libopenmpi-dev


fftw
sudo mkdir /opt/fftw/
sudo chown ${USER} /opt/fftw
mkdir ~/tmp
cd ~/tmp
wget ftp://ftp.fftw.org/pub/fftw/fftw-3.3.2.tar.gz
tar xvf fftw-3.3.2.tar.gz
cd fftw-3.3.2/

./configure --enable-float --enable-mpi --enable-threads --with-pic --prefix=/opt/fftw/fftw-3.3.2/single
make && make install
make clean
./configure --disable-float --enable-mpi --enable-threads --with-pic --prefix=/opt/fftw/fftw-3.3.2/double

make && make install


openblas
sudo mkdir /opt/openblas
sudo chown ${USER} /opt/openblas
cd ~/tmp
wget http://nodeload.github.com/xianyi/OpenBLAS/tarball/v0.1.1
tar xvf v0.1.1
cd xianyi-OpenBLAS-e6e87a2/
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
make all BINARY=64 CC=/usr/bin/gcc FC=/usr/bin/gfortran USE_THREAD=0 INTERFACE64=1 1> make.log 2>make.err

make PREFIX=/opt/openblas install
cp lib*.*  /opt/openblas/lib

add
export LD_LIBRARY_PATH:$LD_LIBRARY_PATH:/opt/openblas/lib
to your ~/.bashrc

[for later use with nwchem and ecce, add /opt/openblas/lib to /etc/ld.so.conf and do sudo ldconfig]



gromacs


sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
cd ~/tmp
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz
tar xvf gromacs-4.5.5.tar.gz
cd gromacs-4.5.5/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib

single
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/openblas/include"

./configure --disable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_sp --prefix=/opt/gromacs/gromacs-4.5.5
make -j3
make install

double
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/openblas/lib -lopenblas
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/openblas/include"

./configure --disable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dp --prefix=/opt/gromacs/gromacs-4.5.5
make -j3
make install

single + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/single/lib -L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/single/include -I/opt/openblas/include"

./configure --enable-mpi --enable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_spmpi --prefix=/opt/gromacs/gromacs-4.5.5
make -j3
make install


double + mpi
make distclean
export LDFLAGS="-L/opt/fftw/fftw-3.3.2/double/lib -L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/fftw/fftw-3.3.2/double/include -I/opt/openblas/include"

./configure --enable-mpi --disable-float --with-fft=fftw3 --with-external-blas --with-external-lapack --program-suffix=_dpmpi --prefix=/opt/gromacs/gromacs-4.5.5
make -j3
make install



Make sure to put this in your ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
export PATH=$PATH:/opt/gromacs/gromacs-4.5.5/bin
 You now have four versions of each binary -- with and without mpi, with single and with double precision.


10 May 2012

147. Oniom in nwchem -- with a little bit of help from gromacs and openbabel

Example -- I want to do explicit solvent modelling of methanol in water. This is obviously an articifical approach, but generally applicable.

This is a rough approach to doing oniom calculations using nwchem 6.0 -- this is a technical description, not a how-to when it comes to the science.

1. Pre-optimisation
Draw methanol and set up a simple calc using e.g. ecce to pre-optimise the structure with e.g. an implicit solvent model. Here's nwch.nw:

scratch_dir /scratch
Title "pre-oniom"
Start  pre-oniom
echo
charge 0
geometry autosym units angstrom
 C     0.00000     0.00000     0.00000
 H     -0.675500     -0.675500     0.675500
 H     0.675500     -0.675500     -0.675500
 H     -0.675500     0.675500     -0.675500
 O     0.866025     0.866025     0.866025
 H     1.51843     0.213620     1.51843
end
ecce_print /home/me/jobs/jobs/testing/old/pre-oniom/ecce.out
basis "ao basis" spherical print
  H library "6-31+G*"
  O library "6-31+G*"
  C library "6-31+G*"
END
dft
  mult 1
  XC b3lyp
  mulliken
end
driver
  default
end
cosmo
end
task dft optimize
task dft freq


2. Solvation using gromacs
Take the output, nwch.nwout, and use babel to export the optimised structure

babel -inwo nwch.nwout -oxyz molecule.xyz

The next few steps require gromacs:
editconf -f molecule.xyz -o molecule.gro -box 2 2 2
genbox -cp molecule.gro -cs spc216.gro -o solvated.gro
babel -igro solvated.gro -oxyz solvated.xyz

tail -n +3 solvated.xyz > oniom.nw

3. Putting it all together

The only 'trick' is how to define what part of the input belongs to the high level section and what belongs to the low level section -- for a solvation case like this, where whole molecules are treated by one method or another, use model. The last atom to be part of the high level section is the last of the six atoms in methanol, so the keyword is model 6. Atoms 7 to infinity are thus part of the sto-3g part, and atoms 1-6 part of the 6-31g* part.


memory stack 600 mb heap 200 mb global 800 mb

scratch_dir /scratch
Title "oniom"
Start oniom
echo
charge 0

geometry units angstrom
symmetry c1
C         10.12000       10.35000       10.00000
H          9.60000       10.72000       10.89000
H          9.60000       10.72000        9.11000
H         11.14000       10.74000       10.00000
[.]
H          0.46000        2.39000        6.86000
H          1.16000        1.73000        8.18000
end

basis sto-3g spherical
 * library sto-3g
end

basis 6-31g spherical
 * library 6-31g
end



oniom
    model 6
    low  dft basis sto-3g input "dft\; xc\; end"
    high dft basis "6-31g" input "dft\; xc\; end"
end

basis "ao basis" spherical print
  H library "6-31+G*"
  O library "6-31+G*"
  C library "6-31+G*"
END

task oniom



This takes forever to run, but run it does. The memory statement is important -- if the global memory is too small it will crash. Also, be aware that the amount of memory specified is per instance -- if you launch with mpirun -n 3, multiply by 3 to get the amount of memory that needs to be present (here, 2.4 GB physical RAM).



24 January 2012

56. Gromacs -- setting up and running energy minimisation of methanol in water. Using itp files

* First browse through this tutorial.
* I'm not an expert at GROMACS -- I'm writing this up as I am learning myself. Use what I show here as a starting point for your own explorations. Don't treat it like a scientifically complete piece of work.

1. Building the .gro
Draw a methanol molecule in e.g. Avogadro (available in the debian repos) and save as .xyz.
6

C         -1.60284       -0.81294        0.00066
O         -0.18719       -0.81849       -0.00206
H         -1.97028       -1.66300        0.58111
H         -1.97029       -0.87387       -1.02686
H         -1.95386        0.11552        0.45631
H          0.09473       -1.65393       -0.41207
You should edit the file to give the atoms unique names -- keep them short though, as each string field is limited to 5 chars in the .gro file. You will refer to these names later in your metahnol.itp file.

Here's my new methanol.xyz:
6
meth
COH         -1.60284       -0.81294        0.00066
OHC         -0.18719       -0.81849       -0.00206
HC1         -1.97028       -1.66300        0.58111
HC2         -1.97029       -0.87387       -1.02686
HC3         -1.95386        0.11552        0.45631
HO          0.09473       -1.65393       -0.41207
Next we convert it to a .gro file and create a 2.5 nm x 2.5 nm x 2.5 nm box around it. 

editconf -f methanol.xyz -o methanol.gro -box 2.5 2.5 2.5 

methanol.gro:

meth
    6
    1 ???   COH    1   1.216   1.264   1.257
    1 ???   OHC    2   1.358   1.263   1.257
    1 ???   HC1    3   1.179   1.179   1.315
    1 ???   HC2    4   1.179   1.258   1.154
    1 ???   HC3    5   1.181   1.357   1.302
    1 ???    HO    6   1.386   1.180   1.216
   2.50000   2.50000   2.50000

2. Building the .top
We're going to make it a bit easier and more modular for ourselves by using the #include directive and by (re)using .itp files -- this way we can pull instructions and data from stock files.

step1.top:
[defaults]
1   1   yes   1.0   1.0

#include "atomtypes.itp"
#include "methanol.itp"
#include "water.itp"

[system]
wood liquor

[molecules]
meth   1
SOL   1
It now remains for us to define the molecule meth in methanol.itp, SOL in water.itp and define the proper atomtypes in atomtypes.itp.

3. Building the methanol.itp

First create a .gzmat file from the .xyz using openbabel


babel methanol.xyz methanol.gzmat

methanol.gzmat:


#Put Keywords Here, check Charge and Multiplicity.
 meth
0  1
Xx
Xx  1  r2
Xx  1  r3  2  a3
Xx  1  r4  2  a4  3  d4
Xx  1  r5  2  a5  3  d5
Ho  2  r6  1  a6  3  d6
Variables:
r2= 1.4157
r3= 1.0929
a3= 109.52
r4= 1.0929
a4= 109.52
d4= 239.22
r5= 1.0922
a5= 109.00
d5= 119.61
r6= 0.9724
a6= 107.10
d6=  60.39


The zmat format gives an easy overview over bond-lengths and angles, which is what I use it for. For something as small as methanol you don't really need a zmat file, but it can come in handy for large molecules. We'll also include #ifndef, #else and #endif -- this way we have an easy way of switching between using constraints and not doing so. ifndef checks to see whether FLEXIBLE is defined or not (ifndef=If Not Defined).

Remember that the atom order in methanol.itp should be the same as in the .gro file.

The partial charges are addressed under atomtypes. Be aware that while gzmat outputs bond lengths in  Ångström you should use nanometres in your .itp file or things will get weird.

methanol.itp:


;;;;;;;;;;;;;;;;;;
; Methanol
;;;;;;;;;;;;;;;;;;
[ moleculetype ]
meth    3
[atoms]
;nr type    resnr   res atom    cgnr    chrg    mass
1   COH 1   meth    COH   1   0.1450   12.011
2   OHC 1   meth    OHC   1   -0.683   15.9994
3   HC  1   meth    HC1   1   0.0400   1.00079
4   HC  1   meth    HC2   1   0.0400   1.00079
5   HC  1   meth    HC3   1   0.0400   1.00079
6   HO  1   meth    HO   1   0.4180   1.00079
#ifndef FLEXIBLE
[constraints]
;ai aj  f   nm
3   1   1   0.110929  ; H-C
4   1   1   0.110929  ; H-C
5   1   1   0.110929  ; H-C
1   2   1   0.14157  ; C-O
6   2   1   0.09724  ; H-O
[exclusions]
1   2   3   4   5   6
2   1   3   4   5   6
3   1   2   4   5   6
4   1   2   3   5   6
5   1   2   3   4   6
6   1   2   3   4   5
#else
[bonds]
;ai aj  f   nm  kb
3   1   1   0.110929  10  ; H-C, r3
4   1   1   0.110929  10  ; H-C, r4
5   1   1   0.110929  10  ; H-C, r5
2   1   1   0.14157  100  ; C-O, r2
6   2   1   0.09724  1  ; H-O, r6
;[ pairs ]
; i j   funct
;2   6
;3   6
;4   6
[angles]
;ai aj ak   f   angle   k_a
3   1   2   1   109.52  500   ; H-C-O a3
4   1   2   1   109.52  500   ; H-C-O a4
5   1   2   1   109.52  500   ; H-C-O a5
6   2   1   1   107.10  500   ; H-O-C a6
3   1   4   1   120.00  500
4   1   5   1   120.00  500
3   1   5   1   120.00  500

[dihedrals];proper - f=1
6   2   1   3   1   60   300    2    ;H(O)-O-C-H(C) d6
;6   2   1   4   1      
;6   2   1   5   1
[dihedrals];improper - f=2
4   1   2   3   2   239.22  300    ;H-C-O-H(C) d4
5   1   2   3   2   119.61  300    ;H-C-O-H(C) d5
#endif



3. Building the water.itp
The partial charges are addressed under atomtypes, but this is essentially a copy of the (/usr/local/gromacs/share/gromacs/)top/oplsaa.ff/spce.itp file.

water.itp:
[moleculetype]
sol 2
[atoms]
;nr type resnr res atom cgnr charge mass
1   OW  1   SOL OW  1   -0.8476 15.9994
2   HW  1   SOL HW1 1   0.4238  1.00079
3   HW  1   SOL HW2 1   0.4238  1.00079


#ifndef FLEXIBLE
[settles]
;OW fun doh dhh
1   1   0.1 0.16330
[exclusions]
1   2   3
2   1   3
3   1   2
#else
[bonds]
;i  j   f   l(nm)  kb
1   2   1   0.1 34.5e3
1   3   1   0.1 34.5e3
[angles]
;ai aj ak f angle k_a
2   1   3   1   109.47  383
#endif

4. Building the atomtypes.itp
This is the tricky bit -- you need to pick the correct parameters for your atoms, and as they are empirical it is difficult to know what is 'right'.

A quick way of getting decent starting values is to
cat /usr/local/gromacs/share/gromacs/top/oplsaa.ff/ffnonbonded.itp

and look through the OPLS-AA for different atoms.

e.g.

cat /usr/local/gromacs/share/gromacs/top/oplsaa.ff/ffnonbonded.itp | grep OH

gives 
; name  bond_type    mass    charge   ptype          sigma      epsilon
 opls_023   OH 8      15.99940    -0.700       A    3.07000e-01  7.11280e-01 ; SIG

 opls_078   OH 8     15.99940    -0.700       A    3.07000e-01  7.11280e-01 ; SIG
 opls_154   OH  8     15.99940    -0.683       A    3.12000e-01  7.11280e-01
 opls_162   OH  8     15.99940    -0.635       A    3.07000e-01  7.11280e-01
 opls_167   OH   8  15.99940    -0.585       A    3.07000e-01  7.11280e-01
 opls_169   OH  8   15.99940    -0.700       A    3.07000e-01  7.11280e-01
 opls_171   OH  8   15.99940    -0.730       A    3.07000e-01  7.11280e-01
 opls_187   OH   8  15.99940    -0.700       A    3.07000e-01  7.11280e-01
 opls_268   OH   8  15.99940    -0.530       A    3.00000e-01  7.11280e-01
 opls_420   OH   8  15.99940    -0.980       A    3.15000e-01  1.04600e+00
 opls_434   OH   8  15.99940    -1.300       A    3.20000e-01  1.04600e+00

For most definitions sigma=3.07000e-01 and epsilon=7.11280e-01, so we'll use that. However, we want to use c6 and c12 notation, where c6=4*epsilon*sigma^6 and c12=4*epsilon*sigma^12.

An alternative is to look through the literature for suitable values.

An advantage with making an atomtypes.itp file yourself is that you can slowly expand it between experiments.

Here's my atomtypes.itp:
[atomtypes]
;atom no mass charge ptype c6 c12
;water
OW  8   15.9994 -0.8476 A   0.261710e-02    0.26331e-5
HW  1   1.00079 0.42380 A   0.000000e-02    0.00000e-5
;methanol
; Methanol, Jorgensen et al. JACS 118 pp. 11225 (1996)
;opls-aa/ffnonbonded.itp
COH 6     12.01100     0.145       A    0.20305e-2  0.37326e-5      ;s/e 3.50000e-01  2.76144e-01
OHC 8     15.99940    -0.683       A    0.26244e-2  0.24208e-5      ;s/e 3.12000e-01  7.11280e-01
HC  1      1.00800     0.040       A    0.012258e-2 0.0029926e-5    ;s/e 2.50000e-01  1.25520e-01
HO  1      1.00800     0.418       A    0.00000000  0.000000000     ;s/e 0.00000e+00  0.00000e+00
;CO2
OC  8   15.9994 -0.35   A   2.5438e-1   2.0478e-4
CO  6   12.0107 0.700   A   5.2044e-2   2.5080e-5
;CO3
OCO2    8   15.9994 -1.0450 A   0.261710e-2 0.26331e-05
CO3     6   12.0107 1.13500  A   0.234020e-2 0.33740e-05



5. Our em.mdp
This is from one of Justin Lemkul's tutorials


; Parameters describing what to do, when to stop and what to save
define = -DFLEXIBLE ; relaxes constraints in the .itp files
integrator = steep ; Algorithm (steep = steepest descent minimization)
emtol = 1000.0   ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep          = 0.01          ; Energy step size
nsteps = 50000   ; Maximum number of (minimization) steps to perform
; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist = 1 ; Frequency to update the neighbor list and long range forces
ns_type = grid ; Method to determine neighbor list (simple, grid)
rlist = 1.2 ; Cut-off for making neighbor list (short range forces)
coulombtype = PME ; Treatment of long range electrostatic interactions
rcoulomb = 1.2 ; Short-range electrostatic cut-off
rvdw = 1.2 ; Short-range Van der Waals cut-off
pbc = xyz ; Periodic Boundary Conditions (yes/no)

6. Energy minimisation
I'll keep it brief:

Add water
genbox_dd  -cp methanol.gro -cs /usr/local/gromacs/share/gromacs/top/spc216.gro -o step1.gro -p step1.top 

Prepare mdrun input
grompp_dd -f em.mdp -po step2.mdp -c step1.gro -p step1.top -pp step2.top -o step2.tpr 

Do energy minimsation
mdrun_dd -v -s step2.tpr -o step3.trr -x step3.xtc -c step3.gro -e step3.edr -g step3.log


Using a run.sh

I get tired of typing the same thing over and over - especially when troubleshooting

Create a run.sh file and call it with sh run.sh every time
editconf_dd -f methanol.xyz -o methanol.gro -box 2.5 2.5 2.5
genbox_dd  -cp methanol.gro -cs /usr/local/gromacs/share/gromacs/top/spc216.gro -o step1.gro -p step1.top
grompp_dd -f em.mdp -po step2.mdp -c step1.gro -p step1.top -pp step2.top -o step2.tpr
mdrun_dd -v -s step2.tpr -o step3.trr -x step3.xtc -c step3.gro -e step3.edr -g step3.log

7. Equilibration and production runs
See previous tutorial for mdp files.

I know it's a bit of a cop-out, but no time to make a more detailed tutorial right now.

18 January 2012

53. GROMACS -- carbon dioxide in water. Example

I'm new to GROMACS, so don't follow what I've done blindly.

To get started you need a .top and a .gro file. The .gro file is not dissimilar to a .pdb file and contains xyz coordinates and atom names. The .gro format is inflxible in that specific numbers of chars are dedicated to each field -- if that's Greek to you, then take this advice: don't edit it by hand. If you do, only substitute x number of characters with the same number of characters.

The .top file contains information about the atoms in each molecule and defines their properties, including partial charges, Lennar-Jones or Buckingham params, constraints etc.

 Making the .top file is probably the most challenging step when you are new to GROMACS. Once you've done it a few times it becomes easy, although perhaps somewhat tedious at times. Using a zmat file as your starting point may help (we'll do that here)

For each 'experiment' you need an .mdp file. The .mdp file defines what methods are used during the run, the type of experiment, cutoffs, etc.

In the example I use the gromacs binaries I compiled in an earlier post. To avoid confusion I'll use the double-precision (_dd) binaries the entire time. If you are using the debian gromacs binaries, use _d instead.

To start you need co2.gro, step1.top, em.mdp, eq.mdp, eq2.mdp and production.mdp. The rest of the files are generated

Comments are added to gromacs files by adding a ; in front.

Summary of commands that we'll be using after we have our .gro and .top files:

genbox_dd -cp co2.gro -o step1.gro -cs /usr/local/gromacs/share/gromacs/top/spc216.gro -p step1.top
# energy minimisation
grompp_dd -f em.mdp -po step3.mdp -p step1.top -pp step2.top -c step1.gro -o step3.tpr
mdrun_dd -v -s step3.tpr -o step4.trr -x step4.xtc -cpo step4.cpt -c step4.gro -e step4.edr -g step4.log
# equilibration step 1
grompp_dd -f eq.mdp -po step6.mdp -p step2.top -pp step6.top -c step4.gro -o step6.tpr
mdrun_dd -v -s step6.tpr -o step7.trr -x step7.xtc -cpo step7.cpt -c step7.gro -e step7.edr -g step7.log
# equilibration step 2
grompp_dd -f eq2.mdp -po step8.mdp -p step6.top -pp step8.top -c step7.gro -o step8.tpr
mdrun_dd -v -s step8.tpr -o step9.trr -x step9.xtc -cpo step9.cpt -c step9.gro -e step9.edr -g step9.log
# production run
grompp_dd -f production.mdp -po step10.mdp -p step8.top -pp step10.top -c step9.gro -o step10.tpr
mdrun_dd -v -s step10.tpr -o step11.trr -x step11.xtc -cpo step11.cpt -c step11.gro -e step11.edr -g step11.log 


START HERE
If at any point you get annoying fatal errors, your first thought should be 'formatting'. I don't know how faithful blogspot is towards tabs etc. Google -- if you're listening: please allow the upload of simple ascii files! Oh, and while we're at it -- if I'm emailing text files with unix line endings, don't effing change them to windows line endings! Ehum...so...


1. Making a .gro file
I first made an .xyz file using avogadro and optimised it using the built-in MM engine.

co2.xyz:

3

C         -5.06401        3.32301        1.92535
O         -4.89871        1.97004        1.81668
O         -5.22923        4.67528        2.03397


I edited the file to
1. Add a title card (not important)
2. give the atoms specific names (the names aren't important -- but they should match those used in .top). I decided to call the carbon CO and the Oxygens OC since the C is bonded to O and the Os are bonded to C. The names must be 1-5 characters long.

co2_edited.xyz:

3
CAR
CO         -5.06401        3.32301        1.92535
OC         -4.89871        1.97004        1.81668
OC         -5.22923        4.67528        2.03397

Next, we make our .gro file and set the box size to 2 by 2 by 2 nanometres:
editconf_dd -f co2_edited.xyz -o co2.gro -box 2 2 2 -label CAR -resnr 1
co2.gro:

CAR
    3
    1 ???    CO    1   1.000   1.000   1.000
    1 ???    OC    2   1.017   0.865   0.989
    1 ???    OC    3   0.983   1.135   1.011
   2.00000   2.00000   2.00000
I opened co2.gro in vim and did 
:%s/???/CAR/g
the saved using 
:wq

co2.gro:
CAR
    3
    1 CAR    CO    1   1.000   1.000   1.000
    1 CAR    OC    2   1.017   0.865   0.989
    1 CAR    OC    3   0.983   1.135   1.011
   2.00000   2.00000   2.00000

This is our starting .gro file.


2. Making our .top file
I first made a zmat file since it contains angles and bond lengths. CO2 is a simple enough molecule that this isn't necessary, but it's good to have a standard approach.

If you haven't yet installed babel, then do so:
sudo apt-get install openbabel

Generate the zmat file:
babel co2.xyz co2.gzmat

co2.gzmat:
 CAR
0  1
Co
Xx  1  r2
Xx  1  r3  2  a3
Variables:
r2= 1.3674
r3= 1.3666
a3= 179.97

For an explanation of the zmat format, look here.

OK, time to create our .top file

Open a new file and call it step1.top

First create directive headers to outline the file -- we will have to types of molecules -- CAR (CO2) -- and SOL (water), so we have two sets of [molecultypes] (+ [atoms], [constraints] etc.):
[defaults]
[atomtypes]
[moleculetype]
[atoms]
[constraints]
[exclusions]
[angles]
[dihedrals]
[moleculetype]
[atoms]
[constraints]
[exclusions]
[system]
[molecules]


The order of the different directive is (sadly) important. 

OK, time to add information to the different sections -- let's start with the easy ones:

[defaults]:
[defaults]
; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ
1 1 yes 1.0 1.0

This is a standard directive when you roll your own simulation independent of predefined forcefields. nbfunc=1 means we're using Lennard-Jones c6/c12 parameters.

[system]:
[system]
sparkling water

[system] is basically like a title-card

[molecules]:
[molecules]
CAR      1
SOL      1

OK, time to define our carbon dioxide molecule:
[moleculetype]:
[moleculetype]
;name nrexcl
CAR    3
CAR is the name and nrexcl=3 tells gromacs to exclude non-bonded interactions between atoms that are no further than 3 bonds away.

For atoms the order should be the same as in the .gro file. Here CO  and the two OC make up a single charge group (all have the same cgnr=1). I got the partial charges from an article -- http://pubs.acs.org/doi/full/10.1021/jp062723w.

The type (e.g. CO) must match that which will later be defined in [atomtypes]. atom (e..g CO) must match that in the .gro file. mass is self-explanatory, as is nr.

[atoms]
[atoms]
;   nr   type  resnr residue  atom   cgnr     charge       mass
1 CO 1 CAR CO 1 0.70         12.0107
2 OC 1 CAR OC 1 -0.35 15.9994
3 OC 1 CAR OC 1 -0.35 15.9994

We will constrain the bond distances -- which we got from the zmat file above (r2 and r3 -- I picked 1.3674 Ångström, which is 0.13674 nm:

[constraints]:
; particles bonded if defined in bonds (1, 5, 6) or constraints
[constraints]
;ai     aj func bond
1 2 1 0.13674
1 3 1 0.13674
[exclusions]:
 [exclusions]
;ai other atoms
1 2 3
2 1 3
3 1 2
Exclusions excludes non-bonded interactions between atom ai and the other atoms listed on the same line. It's probably not necessary in such a small molecule.

[angles]:

[angles]
2 1 3   1 180 1500

There's only one angle -- atom 1 is carbon, so the angle is across O=C=O, or 2=1=3 or, if you prefer, 3=1=2 -- look at atom nr. We know that the angle is 180 degrees, but the zmat data also gave us that. Again, useful for more complex molecules.

Comment out dihedral, since there aren't any. You should be aware of the existence of it though.
[dihedrals]:
;[dihedrals]

Water:
We do the same thing for water, just without explanations. This was copied verbatim from the gromacs manual. Look there for information about [settles]

[ moleculetype ]
; molname nrexcl
SOL 2
[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge       mass
     1     OW      1    SOL     OW      1    -0.8476   15.99940
     2     HW      1    SOL    HW1      1     0.4238    1.00800
     3     HW      1    SOL    HW2      1     0.4238    1.00800
;[ constraints ]
; ai aj funct length(b0, nm) kb(kJ mol-1 nm-2)
;  1 2 1 0.100 ;  1 3 1 0.100 ;  2 3 1 0.1633
[ settles ]
;ai funct doh dhh
 1 1 0.1 0.1633
[ exclusions ]
1       2       3
2       1       3
3       1       2

We've so far avoided [atomtypes], which is probably the MOST IMPORTANT directive.

[atomtypes]:
[atomtypes]
;type z       mass        pq          ptype    c6                 c12
OC 8 15.9994 -0.35 A 2.5438e-1 2.0478e-4
CO 6 12.0107 0.70 A 5.2044e-2 2.5080e-5
OW 8 15.9994 -0.834 A 0.261710e-2 0.26331e-05
HW 1 1.0080 0.417 A 0.0000000000 0.00000e-00

So how did we get here?
Let's look at the first line:

OC is the atom type which we'll refer to under the [atoms] directives.
8 is the atomic number of oxygen.
15.9994  is the average atomic mass of carbon.
-0.35 is the partial charge. For carbon CO and oxygen OC I got it from http://pubs.acs.org/doi/full/10.1021/jp062723w. However, in the following post from 2011: "The charges from [atoms] are used. The charges in [atomtypes] are never used in any forcefield currently used with GROMACS."
A is the particle type -- A stands for Atom
2.5438e-1 is the Lennard-Jones c6 parameter for the carbon dioxide carbon -- I got it from here which gives epsilon and sigma: c6=4*epsilon*sigma^6
2.0478e-4 is the Lennard-Jones c12 parameter for the carbon dioxide carbon -- I got it from here which gives epsilon and sigma: c12=4*epsilon*sigma^12. (sigma_o=0.305 nm; epsilon_o=79 K; sigma_c=0.280 nm; epsilon_c=27 K)

I got OW and HW from a different source.

Finally, here's our entire step1.top:

[defaults]
1 1 yes 1.0 1.0
;http://pubs.acs.org/doi/full/10.1021/jp062723w
;
[atomtypes]
; sigma_o=0.305 nm; epsilon_o=79 K; sigma_c=0.280; epsilon_c=27
OC 8 15.9994 -0.35 A 2.5438e-1 2.0478e-4
CO 6 12.0107 0.70 A 5.2044e-2 2.5080e-5 OW 8 15.9994 -0.8476 A   0.261710e-2  0.26331e-05
HW 1 1.0080 0.4238 A 0.0000000000 0.00000e-00
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; [moleculetype]
CAR 3
[atoms]
1 CO 1 CAR CO 1 0.70 12.0107
2 OC 1 CAR OC 1 -0.35 15.9994
3 OC 1 CAR OC 1 -0.35 15.9994
; particles bonded if defined in bonds (1, 5, 6) or constraints
[constraints]
; func bond
1 2 1 0.13674
1 3 1 0.13674
;all non-bonded interactions between atom 1 and the other atoms ignored
[exclusions]
1 2 3
2 1 3
3 1 2
[angles]
2 1 3   1 180 1500
;[dihedrals]
;no dihedrals
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[ moleculetype ]
; molname nrexcl
SOL 2
[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge       mass
     1     OW      1    SOL     OW      1    -0.8476   15.99940
     2     HW      1    SOL    HW1      1     0.4238    1.00800
     3     HW      1    SOL    HW2      1     0.4238    1.00800
;[ constraints ]
; ai aj funct length(b0, nm) kb(kJ mol-1 nm-2)
;  1 2 1 0.100 ;  1 3 1 0.100 ;  2 3 1 0.1633
[ settles ]
;ai funct doh dhh
 1 1 0.1 0.1633
[ exclusions ]
1       2       3
2       1       3
3       1       2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[system]
fizzy water
[molecules]
CAR 1
SOL               212

3. Add water
We use co2.gro and step1.top as the input. We get our water molecules from /usr/local/gromacs/share/gromacs/top/spc216.gro. We write a new .gro file, step1.gro. It will have our carbon dioxide molecule (CAR) and 212 water molecules (SOL).

genbox_dd -cp co2.gro -o step1.gro -cs /usr/local/gromacs/share/gromacs/top/spc216.gro -p step1.top

Ideally genbox_dd updates step1.top and puts the correct number of SOL molecules under [molecules], but please check.

[..]
Successfully made neighbourlist
nri = 11519, nrj = 660245
Checking Protein-Solvent overlap: tested 92 pairs, removed 9 atoms.
Checking Solvent-Solvent overlap: tested 13414 pairs, removed 540 atoms.
Added 212 molecules
Generated solvent containing 636 atoms in 212 residues
Writing generated configuration to step1.gro
Back Off! I just backed up step1.gro to ./#step1.gro.1#
CAR
Output configuration contains 639 atoms in 213 residues
Volume                 :           8 (nm^3)
Density                :      811.63 (g/l)
Number of SOL molecules:    212
Processing topology
[..]
4. Do energy minimisation (EM):
We first need our em.mdp to tell mdrun what to do. Like all the other .mdp files they originally come from http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/01_pdb2gmx.html, and have been used with a minimal amount of editing.

em.mdp:

; minim.mdp - used as input into grompp to generate em.tpr
; Parameters describing what to do, when to stop and what to save
integrator = steep ; Algorithm (steep = steepest descent minimization)
emtol = 1000.0   ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep          = 0.01               ; Energy step size
nsteps = 50000   ; Maximum number of (minimization) steps to perform
; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist = 1         ; Frequency to update the neighbor list and long range forces
ns_type         = grid ; Method to determine neighbor list (simple, grid)
rlist              = 0.9         ; Cut-off for making neighbor list (short range forces)
coulombtype = PME ; Treatment of long range electrostatic interactions
rcoulomb = 0.9 ; Short-range electrostatic cut-off
rvdw = 0.9 ; Short-range Van der Waals cut-off
pbc          =  xyz         ; Periodic Boundary Conditions (yes/no)
Alright. Let's make our binary .trp file using our step1.top and our step1.gro. We'll use the binary trp file with mdrun in the next step. grompp also output a new top file, step2.top, for us, as well as a new .mdp file, step3.mdp.The latter doesn't matter to us.

grompp_dd -f em.mdp -po step3.mdp -p step1.top -pp step2.top -c step1.gro -o step3.tpr
mdrun_dd -v -s step3.tpr -o step4.trr -x step4.xtc -cpo step4.cpt -c step4.gro -e step4.edr -g step4.log

This step is fast, and we get
Steepest Descents converged to Fmax < 1000 in 36 steps
Potential Energy  = -9.52113394667855e+03
Maximum force     =  5.44327588453194e+02 on atom 1
Norm of force     =  7.44126862763805e+01

5. Equilibration part 1
title = fizzy drink
define = -DPOSRES ; position restrain the protein
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 90000 ; 2 * 50000 = 100 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 100 ; save coordinates every 0.2 ps
nstvout = 100 ; save velocities every 0.2 ps
; ; no nstxtcout
nstenergy = 100 ; save energies every 0.2 ps
nstlog = 100 ; update log file every 0.2 ps
; Bond parameters
continuation = no ; first dynamics run
constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
ns_type = grid ; search neighboring grid cells
nstlist = 5 ; 10 fs
rlist = 0.9 ; short-range neighborlist cutoff (in nm)
rcoulomb = 0.9 ; short-range electrostatic cutoff (in nm)
rvdw = 0.9 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = CAR SOL ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
;;;;;;;;;;;;;;;;;;;;;
; no pressure coupling
;;;;;;;;;;;;;;;;;;;;;;;;;
; Pressure coupling is off
pcoupl = no ; no pressure coupling in NVT
;
;
;
;
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = yes ; assign velocities from Maxwell distribution
gen_temp = 300 ; temperature for Maxwell distribution
gen_seed = -1 ; generate a random seed
We then run:
grompp_dd -f eq.mdp -po step6.mdp -p step2.top -pp step6.top -c step4.gro -o step6.tpr
mdrun -v -s step6.tpr -o step7.trr -x step7.xtc -cpo step7.cpt -c step7.gro -e step7.edr -g step7.log

which gives
step 90000, remaining runtime:     0 s        
 Average load imbalance: 2.7 %
 Part of the total run time spent waiting due to load imbalance: 1.0 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 %

Parallel run - timing based on wallclock.
               NODE (s)   Real (s)      (%)
       Time:     27.299     27.299    100.0
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    258.447     15.110    569.700      0.042


See this tutorial for an explanation of the two-step equilibration approach that we're using. I use 90000 steps instead of 10000, and it really doesn't matter at this stage.

6. Equilibration part 2
eq2.mdp:
title = fizzy drink
define = -DPOSRES ; position restrain the protein
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 90000 ; 2 * 50000 = 100 ps
dt = 0.002 ; 2 fs
; Output control
nstxout = 100 ; save coordinates every 0.2 ps
nstvout = 100 ; save velocities every 0.2 ps
; ; no nstxtcout
nstenergy = 100 ; save energies every 0.2 ps
nstlog = 100 ; update log file every 0.2 ps
; Bond parameters
continuation = yes ; <---  Restarting after NVT
constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
ns_type = grid ; search neighboring grid cells
nstlist = 5 ; 10 fs
rlist = 0.9 ; short-range neighborlist cutoff (in nm)
rcoulomb = 0.9 ; short-range electrostatic cutoff (in nm)
rvdw = 0.9 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = CAR SOL ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; turning on pressure coupling
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; <-- Pressure coupling is on
pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; <-- Velocity generation is off
;
;


and we run
grompp_dd -f eq2.mdp -po step8.mdp -p step6.top -pp step8.top -c step7.gro -o step8.tpr
mdrun_dd -v -s step8.tpr -o step9.trr -x step9.xtc -cpo step9.cpt -c step9.gro -e step9.edr -g step9.lo

The mdrun takes 43 seconds on an intel i5 with four cores.

7. Production run
production.mdp:
title = fizzy water
; removed define
; Run parameters
integrator = md ; leap-frog integrator
nsteps = 500000 ; 2 * 500000 = 1000 ps, 1 ns
dt = 0.002 ; 2 fs
; Output control
nstxout = 1000 ; save coordinates every 2 ps. Incr by *10
nstvout = 1000 ; save velocities every 2 ps. Incr by *10
nstxtcout = 1000 ; xtc compressed trajectory output every 2 ps. Incr by *10
nstenergy = 1000 ; save energies every 2 ps. Incr by *10
nstlog = 1000 ; update log file every 2 ps. Incr by *10
; Bond parameters
continuation = yes ; <-- Restarting after NPT
constraint_algorithm = lincs ; holonomic constraints
constraints = all-bonds ; all bonds (even heavy atom-H bonds) constrained
lincs_iter = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to accuracy
; Neighborsearching
ns_type = grid ; search neighboring grid cells
nstlist = 5 ; 10 fs
rlist = 0.9 ; short-range neighborlist cutoff (in nm)
rcoulomb = 0.9 ; short-range electrostatic cutoff (in nm)
rvdw = 0.9 ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics
pme_order = 4 ; cubic interpolation
fourierspacing = 0.16 ; grid spacing for FFT
; Temperature coupling is on
tcoupl = V-rescale ; modified Berendsen thermostat
tc-grps = CAR SOL ; two coupling groups - more accurate
tau_t = 0.1 0.1 ; time constant, in ps
ref_t = 300 300 ; reference temperature, one for each group, in K
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; turning on pressure coupling
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; <-- Pressure coupling is on
pcoupl = Parrinello-Rahman ; Pressure coupling on in NPT
pcoupltype = isotropic ; uniform scaling of box vectors
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc = xyz ; 3-D PBC
; Dispersion correction
DispCorr = EnerPres ; account for cut-off vdW scheme
; Velocity generation
gen_vel = no ; <-- Velocity generation is off
;
;

and we run
grompp_dd -f production.mdp -po step10.mdp -p step8.top -pp step10.top -c step9.gro -o step10.tpr
mdrun_dd -v -s step10.tpr -o step11.trr -x step11.xtc -cpo step11.cpt -c step11.gro -e step11.edr -g step11.log 

The last run takes four minutes on four cores.

8. Analysis
Again, this follows  this tutorial almost verbatim:
trjconv -s step10.tpr -f step11.xtc -o step12.xtc -pbc mol 



RMS during production run for CAR/SOL:
g_rms -s step10.tpr -f step12.xtc -o step13_rms.xvg -tu ns




CAR
g_gyrate -s step10.tpr -f step12.xtc -o step14_gyrate.xvg

CAR/SOL:
g_rdf -s step10.tpr -f step11.xtc -o step15_rdf.xvg 

And the video has as usual been compressed to something ugly and useless by blogger:
video