17 September 2013

514. Extracting Frequency data from a gaussian 09 calculation for gnuplot

This is another python script.

Say you've done a computation along the lines of this:
#P rBP86/GEN 5D Pseudo(Read) Opt=() Freq=() SCF=(MaxCycle=999 ) Punch=(MO) Pop=()
and want the data in a neat data file, like this:
33.237 0.0023 0.0536 39.9976 0.0043 0.8305 69.7345 0.0129 0.3348 84.7005 0.0173 0.7027 [..] 3133.0068 6.2938 0.6114 3143.8021 6.3551 0.3775 3164.9242 6.4685 0.8829 3221.8787 6.6972 4.6005

Then you can use the following python (2.x) script, g09freq:

#!/usr/bin/python
# Compatible with python 2.7 
# Reads frequency output from a g09 (gaussian) calculation
# Usage ex.: g09freq g09.log ir.dat
import sys 

def ints2float(integerlist):
    for n in range(0,len(integerlist)):
        integerlist[n]=float(integerlist[n])
    return integerlist

def parse_in(infile):
    g09output=open(infile,'r')
    captured=[]
    for line in g09output:
        if ('Frequencies' in line) or ('Frc consts' in line) or ('IR Inten' in line):
            captured+=[line.strip('\n')]
    g09output.close()
    return captured
    
def format_captured(captured):
    vibmatrix=[]
    steps=len(captured)
    for n in range(0,steps,3):
        freqs=ints2float(filter(None,captured[n].split(' '))[2:5])
        forces=ints2float(filter(None,captured[n+1].split(' '))[3:6])
        intensities=ints2float(filter(None,captured[n+2].split(' '))[3:6])
        for m in range(0,3):
            vibmatrix+=[[freqs[m],forces[m],intensities[m]]]
    return vibmatrix

def write_matrix(vibmatrix,outfile):
    f=open(outfile,'w')
    for n in range(0,len(vibmatrix)):
        item=vibmatrix[n]
        f.write(str(item[0])+'\t'+str(item[1])+'\t'+str(item[2])+'\n')
    f.close()
    return 0

if __name__ == "__main__":
    infile=sys.argv[1]
    outfile=sys.argv[2]

    captured=parse_in(infile)

    if len(captured)%3==0:
        vibmatrix=format_captured(captured)
    else:
        print 'Number of elements not divisible by 3 (freq+force+intens=3)'
        exit()
    success=write_matrix(vibmatrix,outfile)
    if success==0:
        print 'Read %s, parsed it, and wrote %s'%(infile,outfile)


Run it as e.g.
g09freq g09.log test.out

The output is compatible with gnuplot:
gnuplot> set xrange [3500:0]
gnuplot> set yrange [10:-1]
gnuplot> plot './test.out' u 1:2 w impulse



It's trivial to add gaussian broadening (see e.g. this post)

05 September 2013

513. Extracting data from a PES scan with gaussian

There are a few reasons to like gaussian, and many reasons not to. Gaussian is fast, and their whitepapers are great resources for learning computational techniques.

Without going into discussions about the commercial behaviour of Wavefunction inc., the things I don't like about gaussian is the clunky input format (nwchem has a much more readable syntax), the inscrutable error messages, and the unreadable output. Well, it's not unreadable in a literal sense, but it could certainly be clearer. On the other hand, I've having issues with running some of my PES scans in nwchem -- and I can't find a solution (more about that in a later post)

Anyway, here's a python script for extracting optimized structures and energies from a relaxed PES scan in Gaussian 09.

First, an example of a simple scan:
%nprocshared=2 %Chk=methanol.chk #P rB3LYP/6-31g 6D 10F Opt=(modredundant) NoSymm Punch=(MO) Pop=() methanol 0 1 ! charge and multiplicity C 0.0351714 0.00548884 0.0351714 H -0.617781 -0.634073 0.667983 H 0.667983 -0.634073 -0.617781 H -0.605139 0.646470 -0.605139 O 0.839603 0.818768 0.839603 H 1.38912 0.201564 1.38912 1 5 S 10 0.1
And here's the script, pes_parse_g09:
#!/usr/bin/python
import sys

def getrawdata(infile):
        f=open(infile,'r')
        opt=0
        geo=0
        struct=[]
        structure=[]
        energies=[]
        energy=[]
        for line in f:
                
                if opt==1 and geo==1 and not ("---" in line):
                        structure+=[line.rstrip()]
                
                if 'Coordinates (Angstroms)' in line:
                        if opt==0:
                                opt=1
                                structure=[]
                        
                if opt==1 and "--------------------------" in line:
                        if geo==0:
                                geo=1
                        elif geo==1:
                                geo=0
                                opt=0
                if 'SCF Done' in line:
                        energy=filter(None,line.rstrip('\n').split(' '))
                if      'Optimization completed' in line and (opt==0 and geo==0):
                        energies+=[float(energy[4])]
                        opt=0
                        geo=0
                        struct+=[structure]
                        structure=[]
        
        return struct, energies

def periodictable(elementnumber):
        ptable={1:'H',2:'He',\
        3:'Li', 4:'Be',5:'B',6:'C',7:'N',8:'O',9:'F',10:'Ne',\
        11:'Na',12:'Mg',13:'Al',14:'Si',15:'P',16:'S',17:'Cl',18:'Ar',\
        19:'K',20:'Ca',\
        21:'Sc',22:'Ti',23:'V',24:'Cr',25:'Mn',26:'Fe',27:'Co',28:'Ni',29:'Cu',30:'Zn',\
        31:'Ga',32:'Ge',33:'As',34:'Se',35:'Br',36:'Kr',\
        37:'Rb',38:'Sr',\
        39:'Y',40:'Zr',41:'Nb',42:'Mo',43:'Tc',44:'Ru',45:'Rh',46:'Pd',47:'Ag',48:'Cd',\
        49:'In',50:'Sn',51:'Sb',52:'Te',53:'I',54:'Xe',\
        55:'Cs',56:'Ba',\
        57:'La',58:'Ce',59:'Pr',60:'Nd',61:'Pm',62:'Sm',63:'Eu',64:'Gd',65:'Tb',66:'Dy',67:'Ho',68:'Er',69:'Tm',70:'Yb',71:'Lu',\
        72:'Hf', 73:'Ta', 74:'W',75:'Re', 76:'Os', 77:'Ir',78:'Pt', 79:'Au', 80:'Hg',\
        81:'Tl', 82:'Pb', 83:'Bi',84:'Po',85:'At',86:'Rn',\
        87:'Fr',88:'Ra',\
        89:'Ac',90:'Th',91:'Pa',92:'U',93:'Np',94:'Pu',95:'Am',96:'Cm',97:'Bk',98:'Cf',99:'Es',100:'Fm',101:'Md',102:'No',\
        103:'Lr',104:'Rf',105:'Db',106:'Sg',107:'Bh',108:'Hs',109:'Mt',110:'Ds',111:'Rg',112:'Cn',\
        113:'Uut',114:'Fl',115:'Uup',116:'Lv',117:'Uus',118:'Uuo'}
        element=ptable[elementnumber]
        return element

def genxyzstring(coords,elementnumber):
        x_str='%10.5f'% coords[0]
        y_str='%10.5f'% coords[1]
        z_str='%10.5f'% coords[2]
        element=periodictable(int(elementnumber))
        xyz_string=element+(3-len(element))*' '+10*' '+\
        (8-len(x_str))*' '+x_str+10*' '+(8-len(y_str))*' '+y_str+10*' '+(8-len(z_str))*' '+z_str+'\n'
 
        return xyz_string

def getstructures(rawdata):
        
        n=0
        for structure in rawdata:
                
                n=n+1
                num="%03d" % (n,)
                g=open('structure_'+num+'.xyz','w')
                itson=False
                cartesian=[]
                        
                for item in structure:
                        
                        coords=filter(None,item.split(' '))
                        coordinates=[float(coords[3]),float(coords[4]),float(coords[5])]
                        element=coords[1]
                        cartesian+=[genxyzstring(coordinates,element)]
                g.write(str(len(cartesian))+'\n')
                g.write('Structure '+str(n)+'\n')
                for line in cartesian:
                        g.write(line)
                g.close()
                cartesian=[]
        return 0
        
if __name__ == "__main__":
        infile=sys.argv[1]
        rawdata,energies=getrawdata(infile)
        structures=getstructures(rawdata)
        g=open('energies.dat','w')
        for n in range(0,len(energies)):
                g.write(str(n)+'\t'+str(energies[n])+'\n')
        g.close()

And here's what we get from the output:
g09 methanol.in |tee methanol.out
pes_parse_g09 methanol.log
cat structure* > meoh_traj.xyz



And here's a plot of energies.dat:

512. Briefly: zmatrices in nwchem -- methanol

And another update:
I can now confirm that using your own z matrix still does not constrain the geometry during a PES scan, which was the original impetus for this post: http://verahill.blogspot.com.au/2013/09/511-when-nwchem-pes-scans-fail-to.html

Another update:
the gaussian run failed after 14 geometry steps during the first PES point.
NTrRot= -1 NTRed= 628 NAtoms= 34 NSkip= 532 IsLin=F Error in internal coordinate system. Error termination via Lnk1e in /opt/gaussian/g09/l103.exe at Thu Sep 5 18:17:12 2013. Job cpu time: 0 days 22 hours 25 minutes 27.6 seconds. File lengths (MBytes): RWF= 192 Int= 0 D2E= 0 Chk= 28 Scr= 1
Not being an expert, to me it seems that there's something fundamentally difficult with the system I'm working on. In an ideal world I'd give the actual details, but quite apart from the risk of being scooped, doing so would also make it easier to identify me (not that it's impossible at this point).

[Suffice to say that the system holds a large polyoxoanion and a small p-block anion, both of which are symmetrical and negatively charged. The goal of the PES scan is to bring the ions closer to see whether they 'react'. Which is also a troublesome use of computational resources -- computational chemistry is good at answering well-defined questions using carefully designed computational experiments -- but not generally very good at answering ill-defined questions about synthesis (i.e. you can't generally 'mix two things together and see what happens' and expect a useful result. Anyway, regardless of that, that's exactly what I want to do.]

Update:
nwchem still gives errors about autoz in spite of using noautoz. But I also get messages about the user generated z matrix, so we'll see whether my input is respected or not.

Also, for one of the calcs I'm getting
There are insufficient internal variables: expected 95 got 96

which is really, really, really annoying since there doesn't seem to be a real fix for it -- I've tried everything suggested in http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id286. I can get the same calc to run in gaussian though (gaussian has its own issues), but it'd be nice if stuff just...worked...

Original post:
Normally you don't have to fiddle with zmatrices in nwchem -- instead you'd typically supply cartesian coordinates, and nwchem would do autoz to autogenerate internal (z matrix) coordinates.

Sometimes that fails, and nwchem defaults to using cartesian coordinates. In most cases, this isn't a cause for any real concern -- the computation will continue although I think cartesian coordinates are supposed to be slightly slower.

However, if you're doing a PES scan you'll notice that it's not proceeding as intended -- the constraints are completely ignored: 511. When nwchem PES scans fail to constrain -- autoz failure

The easiest remedy is to supply the internal coordinates directly, but there honestly aren't too many examples online showing how that's done, and I kept on getting annoying failure messages along the lines of
NWChem Input Module ------------------- zmat ---- THE 3-D PIECE OF -Z- DATA FOR ATOM = 2 IS NEITHER FLOATING POINT NOR ALPHANUMERIC OR COULD NOT BE MATCHED WITH A VARIABLE. STOP IAT= 2 ZMAT= 2 1 0 0 0 0.00000 0.00000 0.00000 ------------------------------------------------------------------------ JOB STOPPED PROGRAM STOP IN - ZDAT - ------------------------------------------------------------------------ ------------------------------------------------------------------------ CALLS IT QUIT FROM HND_HNDERR 0 ------------------------------------------------------------------------ This error has not yet been assigned to a category

This particular error came about because the zmatrix module is case sensitive, and my Variables couldn't be interpreted (it should be variables). Anyway, you'll understand more after this post, and it isn't important anyway.


Calculation using a z matrix (internal coordinates) in nwchem, with a little bit of help from openbabel:

Assuming that you set up a calculation in e.g. ECCE for a geometry optimisation of methanol you'll end up with the following input file:
scratch_dir /home/andy/scratch Title "methanol" Start methanol echo charge 0 geometry autosym units angstrom C 0.0351714 0.00548884 0.0351714 H -0.617781 -0.634073 0.667983 H 0.667983 -0.634073 -0.617781 H -0.605139 0.646470 -0.605139 O 0.839603 0.818768 0.839603 H 1.38912 0.201564 1.38912 end ecce_print ecce.out basis "ao basis" spherical print H library "6-31+G*" O library "6-31+G*" C library "6-31+G*" END dft mult 1 direct XC b3lyp grid fine mulliken end driver default end task dft optimize
Take the coordinates, and paste them into a file, e.g. methanol.xyz:
6 methanol C 0.0351714 0.00548884 0.0351714 H -0.617781 -0.634073 0.667983 H 0.667983 -0.634073 -0.617781 H -0.605139 0.646470 -0.605139 O 0.839603 0.818768 0.839603 H 1.38912 0.201564 1.38912

Next, use openbabel:
babel -ixyz methanol.xyz -ogzmat 
#Put Keywords Here, check Charge and Multiplicity. methanol 0 1 C H 1 r2 H 1 r3 2 a3 H 1 r4 2 a4 3 d4 O 1 r5 2 a5 3 d5 H 5 r6 1 a6 2 d6 Variables: r2= 1.1117 r3= 1.1117 a3= 109.74 r4= 1.1094 a4= 108.78 d4= 118.90 r5= 1.3984 a5= 110.18 d5= 238.51 r6= 0.9924 a6= 105.98 d6= 60.61 1 molecule converted 18 audit log messages
The format isn't quite right (everything in red needs to go, and the V in blue should be lower case), but we can sort that out:

 babel -ixyz ~/methanol.xyz -ogzmat |sed 's/\=//g;s/V/v/g;s/\://g' |tail -n+6 > methanol.zmat
C H 1 r2 H 1 r3 2 a3 H 1 r4 2 a4 3 d4 O 1 r5 2 a5 3 d5 H 5 r6 1 a6 2 d6 variables r2 1.1117 r3 1.1117 a3 109.74 r4 1.1094 a4 108.78 d4 118.90 r5 1.3984 a5 110.18 d5 238.51 r6 0.9924 a6 105.98 d6 60.61

Let's update out nwchem input file with the internal coordinates:
scratch_dir /home/andy/scratch Title "methanol" Start methanol echo charge 0 geometry noautoz zmatrix C H 1 r2 H 1 r3 2 a3 H 1 r4 2 a4 3 d4 O 1 r5 2 a5 3 d5 H 5 r6 1 a6 2 d6 variables r2 1.1117 r3 1.1117 a3 109.74 r4 1.1094 a4 108.78 d4 118.90 r5 1.3984 a5 110.18 d5 238.51 r6 0.9924 a6 105.98 d6 60.61 end end ecce_print ecce.out basis "ao basis" spherical print H library "6-31+G*" O library "6-31+G*" C library "6-31+G*" END dft mult 1 direct XC b3lyp grid fine mulliken end driver default end task dft optimize

And run. Done!