17 June 2013

455. Adding NWChem basis sets to ECCE. Part 1. The formats

I've written a python script that cam
1. do automatic conversion of nwchem basis set files to .BAS and .POT
2. generate entries that can be added to the category file

What it currently can't do is generate a .pag file.

The python script is not in this post. I'll release it soon though.


The structure:
ECCE stores basis sets in server/data/Ecce/system/GaussianBasisSetLibrary/.

The number of files associated with a basis set varies, and the way a basis set is set up seems to vary as well depending on who added it.

Each basis set needs at least the following files:
basis.BAS
basis.BAS.meta
.DAV/basis.BAS.pag
.DAV/basis.BAS.dir

In addition, the basis set needs to be added to the correct category by being added to one of the following files:

Charge correlation_consistent DFTOrbital diffuse ecp ECPOrbital Exchange other_generally_contracted other_segmented polarization pople rydberg
e.g. 6-31G goes to pople, while LANL2DZ/ECP goes to ECPOrbital.

Looking at the basis set tool in ECCE you have the following categories/subcategories:
Orbital: Pople Shared, Other Segmented, Corr. Consistent, Other Gen. Contr., ECP Orbital, DFT Orbital. Auxiliary: Polarization, Diffuse, Rydberg. ECP: DFT: Charge Fitting, Exchange Fitting.
What it means is that you can 'mix and match' by adding your .BAS or .POT files to different category files (e.g. you can have LANL2DZ dp both ECPOrbital, ecp and polarization, all at the same time. See below for how basis sets can be broken up.

Example: The simple cases: 3-21G, 3-21G*, 3-21++G*
For a basis set like 3-21G there are two files: 3-21G.BAS and 3-21G.BAS.meta.
In addition grep shows that there's an entry in the file pople for 3-21G.

The .BAS file:
The entry for C in 3-21G.BAS looks like this:
atom=C contraction shell=S num_primitives=3 num_coefficients=1 172.2560 0.0617669 25.91090 0.358794 5.533350 0.700713 contraction shell=SP num_primitives=2 num_coefficients=2 3.664980 -0.395897 0.236460 0.770545 1.215840 0.860619 contraction shell=SP num_primitives=1 num_coefficients=2 0.195857 1.000000 1.000000
Nothing too strange. For example, the nwchem format for C in 3-21g is:
basis "C_3-21G" CARTESIAN C S 172.2560000 0.0617669 25.9109000 0.3587940 5.5333500 0.7007130 C SP 3.6649800 -0.3958970 0.2364600 0.7705450 1.2158400 0.8606190 C SP 0.1958570 1.0000000 1.0000000 end
Writing a python script that translates between the two is simple.


The .BAS.meta file:
The 3-21G.BAS.meta file looks like this:
references Elements References -------- ---------- H - Ne: J.S. Binkley, J.A. Pople, W.J. Hehre, J. Am. Chem. Soc 102 939 (1980) Na - Ar: M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre, J. Am. Chem. Soc. 104, 2797 (1983). K - Ca: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986). Ga - Kr: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, 359 (1986). Sc - Zn: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987). Y - Cd: K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 880 (1987). Cs : A 3-21G quality set derived from the Huzinage MIDI basis sets. E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995) references info 3-21G Split Valence Basis ------------------------- Elements Contraction References H - He: (3s) -> [2s] J.S. Binkley, J.A. Pople and W.J. Hehre, Li - Ne: (6s,3p) -> [3s,2p] J. Am. Chem. Soc. 102, 939 (1980). Na - Ar: (9s,6p) -> [4s,3p] M.S. Gordon, J.S. Binkley, J.A. Pople, W.J. Pietro and W.J. Hehre, J. Am. Chem. Soc. 104, 2797 (1983) K - Ca: (12s,9p) -> [5s,4p] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 7, Ga - Kr: (12s,9p,3d) -> [5s,4p,1d] 359 (1986). Sc - Zn: (12s,9p,3d) -> [5s,4p,2d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, 861 (1987). Rb - Sr: (15s,12p,3d)-> [6s,5p,1d] Y - Cd: (15s,12p,6d)-> [6s,5p,3d] K.D. Dobbs, W.J. Hehre, J. Comput. Chem. 8, In - I: (15s,12p,6d)-> [6s,5p,2d] 880 (1987). Cs : (18s,12p,6d)-> [6s,5p,2d] A 3-21G quality set derived from the Huzinage MIDI basis sets. E.D. Glendening and D. Feller, J. Phys. Chem. 99, 3060 (1995). The 3-21G basis set contains the same number of Gaussian primitives as the STO-3G basis, but the valence electrons are described with two functions per AO instead of one. In most cases the 3-21G basis set gives results which are as good as the more expensive 4-31G and 6-31G sets. 3-21G Atomic Energies ROHF State UHF (noneq) ROHF (noneq) ROHF(equiv) HF Limit (equiv) ----- ---------- ----------- ----------- --------- H 2-S -0.496199 -0.496199 -0.496199 -0.50000 He 1-S -2.835680 -2.835680 -2.835680 -2.86168 Li 2-S -7.381513 -7.381513 -7.381513 -7.43273 Be 1-S -14.486820 -14.486820 -14.486820 -14.57302 B 2-P -24.389762 -24.389634 -24.148989 -24.52906 C 3-P -37.481070 -37.480389 -37.480389 -37.68862 N 4-S -54.105390 -54.103658 -54.103658 -54.40094 O 3-P -74.393657 -74.392512 -74.391782 -74.80940 F 2-P -98.845009 -98.844645 -98.844230 -99.40935 Ne 1-S -127.132546 -127.803824 -127.803824 -128.54710 Na 2-S -160.854064 -160.854041 -160.854041 -161.85891 Mg 1-S -198.468103 -198.468103 -198.468103 -199.61463 Al 2-P -240.551046 -240.551024 -240.551010 -241.87671 Si 3-P -287.344431 -287.344419 -287.344393 -288.85436 P 4-S -339.000079 -339.000027 -339.000027 -340.71878 S 3-P -395.551336 -395.551083 -395.550591 -397.50490 Cl 2-P -457.276552 -457.276414 -457.276096 -459.48207 Ar 1-S -524.342962 -524.342962 -524.342962 -526.81751 K 2-S -596.152980 -596.152923 -596.152923 -599.16479 info comments 2/16/95 - DFF - Modify the format of the literature citation. 12/07/93 - SJB - Add Nb to Xe. 8/4/93 - DFF - Add Y and Zr. 12/2/92 - DFF - Add Rb and Sr. 7/13/90 - DFF - Original creation of this file from MIA basis set library. comments
Again, most of this can be extracted using a shell/python/perl script from the corresponding 3-21g nwchem basis set file.

The entry for 3-21G in 'pople':
name= 3-21G files= 3-21G.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs
This simple seems to be a list over the files that describe the basis set and the elements supported. Can be autogenerated using a script.

Intermission: polarization and diffuse orbitals, and ECP.
At this stage it's pretty simple. We now have a rough idea of what's needed. We just need to understand how to expand our basis sets.

For 3-21G* and 3-21++G* the polarisation and diffuse orbitals are separated into 3-21GS-AGG.BAS and 3-21GS.BAS, and 3-21PPGS-AGG.BAS and 3-21GS.BAS, and POPLDIFF.BAS. All -AGG.BAS files are empty, so I'm not sure why they are there.

Anyway, this might make it a bit clearer:
3-21G = 3-21G.BAS 3-21G* = 3-21G.BAS + 3-21GS.BAS 3-21++G* = 3-21G.BAS + 3-21GS.BAS + POPLDIFF
What happens to e.g. pople is this:
name= 3-21G* files= 3-21GS-AGG.BAS 3-21G.BAS 3-21GS.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= Na Mg Al Si P S Cl Ar name= 3-21++G* files= 3-21PPGS-AGG.BAS 3-21G.BAS POPLDIFF.BAS 3-21GS.BAS atoms= H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar atoms= Na Mg Al Si P S Cl
The -AGG.BAS files are empty. The first atoms line corresponds to entries in 3-21G.BAS, while for 3-21G* the second one corresponds to entries in 3-21GS.BAS. Likewise,
atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar
are entries in POPLDIFF.BAS.

The good news: it's almost identical when it comes to ECP. Here's the ECPOrbital entry for LANL2DZ:
name= LANL2DZ ECP files= LANL2DZ.BAS LANL2DZ.POT atoms= H Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu
and the ecp entry:
name= LANL2DZ ECP files= LANL2DZ.POT atoms= Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Hf Ta W Re Os Ir Pt Au Pb Bi U Np Pu

The POT file is a little bit different from the .BAS file:
atom=Na ncore=10 lmax=2 ecp_potential%l=2%shell=d potential%num_exponents=5 1 175.5502590 -10.0000000 2 35.0516791 -47.4902024 2 7.9060270 -17.2283007 2 2.3365719 -6.0637782 2 0.7799867 -0.7299393 ecp_potential%l=0%shell=s-d potential%num_exponents=5 0 243.3605846 3.0000000 1 41.5764759 36.2847626 2 13.2649167 72.9304880 2 3.6797165 23.8401151 2 0.9764209 6.0123861 ecp_potential%l=1%shell=p-d potential%num_exponents=6 0 1257.2650682 5.0000000 1 189.6248810 117.4495683 2 54.5247759 423.3986704 2 13.7449955 109.3247297 2 3.6813579 31.3701656 2 0.9461106 7.1241813

.DAV files
The good news: the .DAV/basis.dir file is empty.
The bad news: .DAV/basis.pag is a binary file.
I haven't yet figured out the exact structure of it nor the best way to auto-generate it.
I think the best illustration is to show the od -c output for a few .POT.pag files:
LANL2DZ.POT.pag:
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 L A N 0001760 L 2 D Z E C P \0 1 : n a m e \0 0002000
SBKJC.POT.pag:
0000000 \b \0 371 003 356 003 347 003 342 003 327 003 314 003 304 003 0000020 235 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 0001640 002 D A V : \0 h t t p : / / w w w 0001660 . e m s l . p n l . g o v / e c 0001700 c e : \0 M E T A D A T A \0 A U X 0001720 I L I A R Y \0 1 : c a t e g o r 0001740 y \0 \0 e c p \0 1 : t y p e \0 \0 S 0001760 B K J C E C P \0 1 : n a m e \0 0002000

Trial and error in making files for def2-svp has shown me that you can copy e.g. LANL2DZ.POT.pag to DEF2_ECP.POT.pag, and edit with vim (use binary mode -b) but that you'll need to add enough spaces to the name so that the files both end at the same place. E.g. this works:
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 d e f 0001760 2 - e c p \0 1 : n a m e \0 0002000
but this doesn't (removed a single space at the end of def2-ecp):
0000000 \b \0 371 003 354 003 345 003 340 003 325 003 312 003 302 003 0000020 233 003 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001620 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 004 \0 \0 002 D 0001640 A V : \0 h t t p : / / w w w . e 0001660 m s l . p n l . g o v / e c c e 0001700 : \0 M E T A D A T A \0 A U X I L 0001720 I A R Y \0 1 : c a t e g o r y \0 0001740 \0 e c p \0 1 : t y p e \0 \0 d e f 0001760 2 - e c p \0 1 : n a m e \0 0001777
Note that the names should correspond to the names of the nwchem basis sets and/or files e.g. either 3-21gs or 3-21G*. Or LANL2DZ ECP or lanl2dz_ecp.

As far as I understand the solution will lie in how WebDAV uses .pag files. I don't know anything about that just yet though.

Anyway, that's it for now. There's now enough information to write your own scripts.

No comments:

Post a Comment