25 June 2012

200. How long will your nwchem frequency calc take?

Update 19/12/12: Having done a lot more frequency calculations since I posted this I sincerely doubt that this approach works.

Original post
Because I'm stuck waiting for the results of frequency calcs on some large transition metal clusters, I've become interested in understanding the output of frequency calculations in progress. After all, why wait 15 days for a results if there are early signs that the calculation has gone haywire?

Also, it might just be me, but frequency calculations are not that easy to restart, so you want to make sure that you give them enough wall time to finish if you use a queue manager.

I'm sure most of this could be appreciated by RTFM, but who has time for that?

So this is what the calc does:
After the usual boredom of reading in the geometry and doing an energy calculation, followed by an MO analysis, the computational fun starts.

Each cycle contains the following reports:


  1. Total Density - Mulliken Population Analysis
  2. Spin Density - Mulliken Population Analysis
  3. Total Density - Lowdin Population Analysis
  4. Spin Density - Lowdin Population Analysis
  5. Expectation value of S2:  
  6. NWChem DFT Module
  7.   Caching 1-el integrals 
  8.       Total Density - Mulliken Population Analysis
  9.       Spin Density - Mulliken Population Analysis
with the exception of the first cycle, which also look at the alpha-beta orbital overlaps, the centre of mass, moments of inertia and does a multipole analysis of density and save an initial hessian.





Each cycle ends with a report of the energy for that vibration:


         Total DFT energy =    -3297.032399945703
      One electron energy =   -26618.764098759657
           Coulomb energy =    12938.745973154924
    Exchange-Corr. energy =     -382.742230868704
 Nuclear repulsion energy =    10765.727956527733

 Numeric. integr. density =      441.999974968347

     Total iterative time =   7947.4s

If you do cat nwch.nwout|egrep "Total iterative time|Total DFT energy" you can see the progress:
        Total DFT energy =    -3297.032416366805
     Total iterative time =  12146.0s
         Total DFT energy =    -3297.032399945703
     Total iterative time =   7947.4s
         Total DFT energy =    -3297.032399544749
     Total iterative time =   7946.0s
         Total DFT energy =    -3297.032406934719
     Total iterative time =   7945.8s
         Total DFT energy =    -3297.032405026814

You now have an idea of how long each step takes. But how many steps in total? I think it's 3N steps, where N is the number of atoms.

For my 50 atoms POM using the values above it'd be roughly 8000 s * 150 = 13 days 22 hours.

Which seems about right...

cat nwch.nwout|grep 'Total DFT'|gawk 'END {print NR}'
66
so I've got another 8 days before I can get my hand on some juicy thermochemical data...

Time to start preparing lectures...




22 June 2012

199. NeCTAR -Virtualisation of Australian compute resources -- first steps

So they are seeing whether they can make more efficient use of the compute resources at different institutions in Australia by creating a cloud to pool their resources. One of the potential solutions is called NeCTAR.


Getting started
Go to http://dashboard.rc.nectar.org.au/auth/login/?next=/dash/
Log in using your institutions username and password

You now have two options to deal with the key issue:

Method 1 -- generate online
Once you're in, create a keypair under Manage Compute/Access & Security and give it an easy-to-remember name

This is your private key, so protect it: don't lose it and don't expose it. You can't download it again. You delete it, it's gone.

On your computer
mv ~/Downloads/nectar.pem ~/.ssh
chmod og-rwx ~/.ssh/nectar.pem
cp nectar.pem nectar
ssh-keygen -e -f nectar >nectar.pub

 ls nectar* -lah
-rw------- 1 me me 887 Jun 22 11:31 nectar
-rw------- 1 me me 887 Jun 22 11:28 nectar.pem
-rw-r--r-- 1 me me 335 Jun 22 11:31 nectar.pub
To use the key do
ssh -i nectar user@server

Method 2 -- BYOK
You're using linux -- you probably have your own key already.
Go to the Manage Compute/Access & Security, Import Keypair

Paste your ~/.ssh/id_rsa.pub (or id_dsa.pub) key.


And that's the extent of the setup.

Test run
Go to Manage Compute/Images & Snapshot and select a Real Linux image (i.e. Debian)
Select image
Hit Launch.
Set up -- don't forget to check SSH to be able to log on. If you want to be able to ping, check icmp as well.
Set up the image -- the defaults are ok, but make sure to check icmp (to be able to ping) and ssh (to be able to log in).
Loading
Generating and loading the image takes about 10-20 seconds -- about the duration of a Victorian earthquake.
Running
Now your image is up an running. To check that all is well

ping -c 3 115.146.92.154
PING 115.146.92.154 (115.146.92.154) 56(84) bytes of data.
64 bytes from 115.146.92.154: icmp_req=1 ttl=54 time=1.70 ms
64 bytes from 115.146.92.154: icmp_req=2 ttl=54 time=1.66 ms
64 bytes from 115.146.92.154: icmp_req=3 ttl=54 time=1.73 ms

--- 115.146.92.154 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 1.660/1.698/1.733/0.029 ms

To be able to log in via ssh you need to know what username to use -- it's (probably) image specific.

To find out, click on the image name (here: testing4)
Click me
Hit the 'Log' tab
Select 'log'
 And look for the username which is created in addition to root
Look for the username -- here it's debian


ssh -v -i ~/.ssh/tmp/nectar debian@115.146.92.154
The authenticity of host '115.146.92.154 (115.146.92.154)' can't be established.
RSA key fingerprint is 81:a8:a7:0f:a9:68:a0:08:f1:60:45:e3:57:2e:4c:4c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '115.146.92.154' (RSA) to the list of known hosts.


Once you're in you'll be greeted by:

Linux unnamed-virtual-machine 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
debian@i-00001637:~$ 
debian@i-00001637:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/vda1             9.9G  768M  8.6G   9% /
tmpfs                 2.0G     0  2.0G   0% /lib/init/rw
udev                  2.0G  112K  2.0G   1% /dev
tmpfs                 2.0G     0  2.0G   0% /dev/shm
debian@i-00001637:~$ uname -a
Linux i-00001637 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64 GNU/Linux
debian@i-00001637:~$ groups
debian cdrom floppy audio dip video plugdev
debian@i-00001637:~$ cat /etc/group|grep debian
cdrom:x:24:debian
floppy:x:25:debian
audio:x:29:debian
dip:x:30:debian
video:x:44:debian
plugdev:x:46:debian
debian:x:1000:

When you're done, don't forget to log out and terminate your image. If you leave it running it will count towards your resource allocations.
Terminating


Notes:

1. you'll run into trouble with the key fingerprints eventually as the IP addresses and key fingerprints won't be matching. Either you'll be doing a lot of editing of you ~/.ssh/known_hosts file or you have to relax your security setttings.

2. Yes, you can log in as root as well. The default user does not have sudo powers. 

3. It takes about 60 seconds after the launch of the image before the openssh server is up and accepting connections. Think more desktop speeds than laptop+SSD speeds.

4. For actual production stuff you can crank up the image requirements:
16 cores and 65 GB? Why, thank you!
5. Also, I think the real value of using virtual machine is that you can load a vanilla setup and customize it, then saving it by making a snapshot:
Snapshot saves
A first couple of actions might be to add a new user, and edit /etc/sudoers.

Troubleshooting ssh:
If you're having problems logging in using your key, use the ssh -v switch as shown above and parse the output



Unsuccessful:
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/me/.ssh/id_rsa
debug1: Authentications that can continue: publickey,password
debug1: Trying private key: /home/me/.ssh/id_dsa
debug1: Trying private key: /home/me/.ssh/id_ecdsa
debug1: Next authentication method: password

A successful authentication should contain
debug1: Offering RSA public key: /home/me/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).


If you are sure that you're using the right key (e.g. using -i), then make sure that you're using the right username -- to find out how to find it, look above.


21 June 2012

198. Nwchem -- freeze core and tddft on benzene

UPDATE: it's taken a while, but I've tested this on a large polyoxometalate cluster (>10 group 5/6 atoms and >30 oxygens) -- with a total of ca 700 alpha and beta orbitals, respectively, I froze 200 core orbs and used 3-21G/3-21G* w/ ub3lyp. The frozen core calculation took 44% of the time the full calculation took. The spectra look identical to within 3 nm (18 roots -- only 2 intermediate transitions have been shifted. All other transitions are identical). The full calc took 7 days and 4 hours, while the frozen calc took 3 days and 4 hours. 

Original post:
Benzene has 21 occupied α and 21 occupied β orbitals.

How many core orbitals can we freeze when looking at electronic transitions, how does freezing core orbitals affect the energies, and what are the performance gains, if any?

I used the following relevant statements
tddft
    cis
    freeze core X
    nroots 24
end
task tddft energy
Also, xc b3lyp and 6-31++g**.

Freeze core 10 means freeze 10 α and freeze 10 β orbitals.

Results:

Frozen Runtime    Transitions with f>0
0          1111 s     6.7835, 6.8038, 6.8042, 7.3479, 7.3496
5          1107 s     6.7835, 6.8038, 6.8042, 7.3480, 7.3498
10        1063 s     6.7838, 6.8040, 6.8045, 7.3761, 7.3862
15        1063 s     6.5878, 6.7840, 6.8042, 6.8046
20          719 s     6.8038, 6.8355, 7.1692, 7.5334, 8.1866

Evaluation:
We're not really looking at what's right or wrong -- the main goal is to understand how the results are affected (we might accidentally get 'right' answer doing something stupid). Freezing more than 10 α/β orbitals leads to significant differences in predicted transitions.

Taking oscillation strength into account and plotting it, we see that we really don't want to overdo it with the number of frozen cores -- 15 and 20 frozen cores yield results that are about as predictable as a coin toss. We also don't see any overwhelming performance gains, but that may well be due to the size and computational cost (or lack thereof) of the system.



Octave code:
spec1=load('bz631gppdd_cosmo.dat');
spec2=load('bz631gppdd_cosmo_f5.dat');
spec3=load('bz631gppdd_cosmo_f10.dat');
spec4=load('bz631gppdd_cosmo_f15.dat');
spec5=load('bz631gppdd_cosmo_f20.dat');

gau= @(x,c,w,i) i*(1/(w*sqrt(2*pi))).*exp(-0.5.*((x-c)./w).^2);
x=linspace(160,200,200);
y1=cumsum(gau(x,1241.25./spec1(:,2),4,spec1(:,3)));
y2=cumsum(gau(x,1241.25./spec2(:,2),4,spec2(:,3)));
y3=cumsum(gau(x,1241.25./spec3(:,2),4,spec3(:,3)));
y4=cumsum(gau(x,1241.25./spec4(:,2),4,spec4(:,3)));
y5=cumsum(gau(x,1241.25./spec5(:,2),4,spec5(:,3)));


subplot(3,2,1)
 plot(x,y1(rows(y1),:))
 axis([160 200 0 0.2]);
 title('0 frozen');
 subplot(3,2,3)
 plot(x,y2(rows(y2),:))
 axis([160 200 0 0.2]);
 title('5 frozen');
 subplot(3,2,5)
 plot(x,y3(rows(y3),:))
 axis([160 200 0 0.2]);
 title('10 frozen');
 subplot(3,2,2)
 plot(x,y4(rows(y4),:))
 axis([160 200 0 0.2]);
 title('15 frozen');
 subplot(3,2,4)
 plot(x,y5(rows(y5),:))
 axis([160 200 0 0.2]);
 title('20 frozen');
 subplot(3,2,6)
  plot(x,y1(rows(y1),:),x,y2(rows(y2),:), x, y3(rows(y3),:), x,y4(rows(y4),:),x,y5(rows(y5),:))
  axis([160 200 0 0.2]);
 title('');

197. Post-mortem of the Moe Quake

The news keep on reporting about a single injured person who was unlucky enough to be standing on a ladder close to the epicentre, but beyond that it seems like no-one else suffered any injuries serious enough to warrant medical attention.

Property damages are a different story though, but the news are having a field day with it, so no point in me repeating what they are saying. My only comment is that the Gippsland/La Trobe Valley area has been hit hard lately, first by floods, storms and now an earthquake -- in addition to recent job losses and uncertainties.

Anyway, science.

www.ga.gov.au has a nice page with technical information about the earthquake:
http://www.ga.gov.au/earthquakes/getQuakeDetails.do?quakeId=3226344

There are several seismograms available from different stations around the country (am I the only chemist who looks at them wanting to apply a FFT?)

Victoria


ACT

Northern QLD

The shape varies with the distance from the earthquake, which I guess tallies with different types of waves travelling at different speeds.

For those of us who are reasonably new to this area, the USGS has a historical earthquake map over Melbourne and the Gippsland/La Trobe valley area.

Here's seismicity in Australia as a whole, and it shows that SE Victoria is no stranger to phenomenon:

http://www.quakes.uq.edu.au/html/quake_info/OZ_QLD_info.html#6

Big earthquakes are a different matter though: http://earthquake.usgs.gov/earthquakes/world/australia/seismicity.php

Only a handful of earthquakes show up on this map, and they are  in WA and NT.

Here's a map with the number of large earthquakes per year (5 and above) -- and Melbourne is by no means the worst hit by the Top 5 cities in Australia
http://earthquake.usgs.gov/earthquakes/world/australia/density.php

Finally, here's a map with the 'earthquake hazard' estimates for different regions of Australia:
http://earthquake.usgs.gov/earthquakes/world/australia/gshap.php

It seems like SE Tassie is the safest, inhabited area. SW WA is the least safe one, but is still nothing compared to PNG and Indonesia and other countries on plate boundaries.

Here's a full paper on seismic hazards in Australia, which contains a nice map with past earthquakes indicated on it: http://www.sciencedirect.com/science/article/pii/S0040195104002185
I'm loading the picture from the publisher's website which is probably the lesser of two evils.

19 June 2012

196. M 5.5 Earthquake near Moe, Victoria. Felt in Melbourne, Australia, 19 June 2012

[20th of July 2012 earthquake here: http://verahill.blogspot.com.au/2012/07/another-earthquake-felt-in-melbourne.html ]

Just (ca 20:50/8.50 pm) experienced my first earthquake (South-eastern suburbs)  -- or rather earth tremor. Rattling doors, a bit of noise, a bit of shaking. Felt like about 10-15 s. More exciting than scary, although I would NOT have wanted to be any closer to the epicentre than we already were (100 km).

Funny it should happen in Australia after five years in California without the slightest tremor.

I like the possum comment below -- was my first thought too! The second one was wind.

If you felt it, you can report it here http://earthquake.usgs.gov/earthquakes/eventpage/usb000ajek#dyfi
There is a point in reporting your experience, since earthquakes aren't just reported in terms of the energy released, but also in terms of damage (or perception).

Update 22:02: http://www.seis.com.au/ is slowly coming back online, but there's no obvious information up yet.

http://www.seis.com.au/ and http://www.ga.gov.au/earthquakes/ went down immediately, and the ga.gov.au site is still down. www.seis.com.au is operating slowly.

Update 21:52. ABC 24 is covering it right now. Quake happened 7 minutes to 9 with the epicentre over by Trafalgar near Moe. No serious injuries. Upgraded to M 5.5.

Update 21:30
The news sites are catching up:
http://www.abc.net.au/news/2012-06-19/magnitude-52-quake-shakes-southern-vic/4080446

http://news.smh.com.au/breaking-news-national/melbourne-hit-by-earthquake-20120619-20m8d.html

http://www.heraldsun.com.au/news/more-news/strong-tremors-rock-victoria/story-fn7x8me2-1226401623358

http://www.theage.com.au/victoria/quake-shakes-melbourne-20120619-20m88.html

Update 21.27: ABC 24 just confirmed it. M 5.2. Moe was the epicentre.

Update 21.10: It's on the USGS website: http://earthquake.usgs.gov/earthquakes/eventpage/usb000ajek#summary




Apparently it was a Magnitude 5.2 centerd somewhere out in Gippsland. Best guess at time is 20:53:29.
The magnitude will likely be adjusted once local data is available.

Postscript:
I posted since the seismology sites were down and there was nothing on the news. Apparently I wasn't the only one who was looking for information:

195. Frequency calcs in NWChem

It's no secret that I'm a computational 'noob'. As such as I'm learning both by reading and by doing.

The doing part consists of checking 1) what the time penalty for different methods is and 2) what the accuracy/differences between different methods are.

Again, these are short calculations for simple molecules. Longer calculations with more exciting features (unpaired electrons, closely spaced MOs, highly negative charges etc.) may well behave completely different.

Today's focus is vibrational calcs.

Test Molecule: CHClF(OH) (chloro-fluoro-methanol)
  1 Title "Freq_test"
  2 
  3 Start  Freq_test
  4 
  5 echo
  6 
  7 charge 0
  8 
  9 geometry noautosym units angstrom
 10  C     0.0416942     -0.501783     0.399137
 11  H     0.0442651     -0.499048     1.48122
 12  O     1.21393     -1.00985     -0.0746688
 13  H     1.25125     -0.957351     -1.06923
 14  F     -1.08480     -1.08768     -0.134571
 15  Cl     -0.120345     1.41214     -0.0717951
 16 end
 17 
 18 ecce_print ecce.out
 19 
 20 basis "ao basis" cartesian print
 21   H library "3-21G"
 22   F library "3-21G"
 23   Cl library "3-21G"
 24   O library "3-21G"
 25   C library "3-21G"
 26 END
 27 
 28 dft
 29   mult 1
 30   odft
 31   mulliken
 32 end
 33 
 34 task dft energy
 35 task dft freq

All geometries were optimised in the gas phase using 3-21G.

0. Some useful statements:
hessian      print "hess_follow"
                 profile
end
1. Basis set (geometry optimised in 3-21g)
(time/enthalpy/entropy/scfe)
3-21G:              81s    24.984 kcal/mol    69.235 cal/mol-K   -671.17956992206 Hartree
6-31G:            105s    21.885 kcal/mol    68.793 cal/mol-K   -674.478768966106
6-31++G**:    399s   21.734 kcal/mol     68.818 cal/mol-K   -674.573524091623
cc-pVDZ:        325s    21.682 kcal/mol    68.819 cal/mol-K   -674.594059146606
aug-cc-pVDZ:  901s   21.605 kcal/mol    68.840 cal/mol-K   -674.623145113155

LANL2DZ(C)/6-+G* 262s  24.923 kcal/mol 68.981 cal/mol-K  -674.539040349134
UHF/aug-cc-pVDZ   373 s 26.196  kcal/mol 68.228 cal/mol-K -672.85402652170

Cation:
3-21G:               ---     21.164 kcal/mol     74.407 cal/mol-K    -670.763278724519 Hartree
6-31G:              142s   21.153 kcal/mol     74.645 cal/mol-K    -674.089132280731
6-31++G**:      637s   21.192 kcal/mol    73.768 cal/mol-K    -674.178146586266
cc-pVDZ:          399s   21.153 kcal/mol    73.736 cal/mol-K    -674.210312017948
aug-cc-pVDZ:   1776s 21.089 kcal/mol     73.774 cal/mol-K   -674.228204222891

LANL2DZ(C)/6-+G* 454s 24.795 kcal/mol  74.293 cal/mol-K -674.140922359750
UHF/aug-cc-pVDZ  741s 26.002 kcal/mol  72.462 cal/mol-K  -672.518095855130

2. Thermochemistry (ΔG of oxidation; gas phase)
3-21G:            -5.3620 kcal/mol +  261.22 kcal/mol =  6.814 V*
6-31G:            -2.4768 kcal/mol +  244.50 kcal/mol =  6.214 
6-31++G**:    -2.0178 kcal/mol+  248.10 kcal/mol =  6.390 
cc-pVDZ:        -1.9950 kcal/mol + 240.80 kcal/mol =  6.075 
aug-cc-pVDZ: -1.9871 kcal/mol + 247.83 kcal/mol =  6.380

LANL2DZ(C)/6-+G* -1.7118 kcal/mol + 249.82 kcal/mol 6.478
UHF/aug-cc-pVDZ -1.4564 kcal/mol +210.80 kcal/mol = 4.797

* vs SHE=4.281 eV

3. Solvation (cosmo/water/scfe)
neutral
3-21g:                66s    22.097 kcal/mol    68.875 cal/mol-K   -671.1936338426 Hartree
6-31g:                82s    22.277 kcal/mol    68.609 cal/mol-K   -674.4934780299
6-31++g**:       277s   21.493 kcal/mol    69.353 cal/mol-K  -674.586704959695
cc-pVDZ:          266s   21.869 kcal/mol    68.808 cal/mol-K  -674.605608009070
aug-cc-pVDZ:    712s  22.116 kcal/mol    69.596 cal/mol-K   -674.635237990779

LANL2DZ(C)/6-31+G* 180s  25.022 kcal/mol   69.073 cal/mol-K -674.552417717602
UHF/aug-cc-pVDZ 412s  24.083 kcal/mol 70.519 cal/mol-K  -672.868085966222

cation (solvation energy)**

3-21G:               --- /26s        21.164 kcal/mol     74.407 cal/mol-K     -670.881469242560 Hartree
6-31G:              142s/51s      21.153 kcal/mol     74.645 cal/mol-K     -674.175491218588
6-31++G**:      637s/111s   21.192 kcal/mol    73.768 cal/mol-K      -674.267298880087
cc-pVDZ:          399s/129s   21.153 kcal/mol    73.736 cal/mol-K      -674.294609415029
aug-cc-pVDZ:   1776s/311s 21.089 kcal/mol     73.774 cal/mol-K     -674.316552324118

LANL2DZ(C)/6-31+G* 454s 24.795 kcal/mol  74.293 cal/mol-K -674.232656980139
UHF/aug-cc-pVDZ   741s 26.002 kcal/mol  72.462 cal/mol-K -672.451040948823
** UHF can't be used with COSMO according to nwchem. Instead we use the cation thermo calcs in the gas phase and use the scfe from a cosmo calc.

Thermochemistry*** (using gas phase freq for both cation and neutral species with scfe w/ cosmo given in parentheses):

3-21G:            -2.5824+195.88=  4.101 V (3.981 V)
6-31G:            -2.9236+199.54=  4.245 V (4.265 V)
6-31++G**:   -1.6173+200.43=  4.341 V (4.324 V)
cc-pVDZ:       -2.1853+195.15=  4.087 V (4.095 V)
aug-cc-pVDZ: -2.2727+199.98= 4.293 V (4.305 V)

LANL2DZ(C)/6-31+G*  -0.41322+200.65= 4.402
UHF/aug-cc-pVDZ 1.3397+261.7 (!)= 7.126
* vs SHE=4.281 eV

*** using freq calc of neutral species with cosmo, vs freq calc of cation in gas phase and energy w/ cosmo

4. Spectra
We'll use octave for this. First, using cat and gawk, I put the x/y coordinates in a file.

gauss= @(x,f,i,sigma)  i.*1./(sigma.*sqrt(2*pi)).*exp(-0.5.*((x-f)./sigma).**2)
subplot(3,2,1); axis([0 4000 0 2])
spc=load('321g.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75)); 
title("321g"); plot(x,spec(18,:))
subplot(3,2,2)
spc=load('ccpvdz.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("ccPVDZ");plot(x,spec(18,:))
subplot(3,2,3)
spc=load('631g.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("631g"); plot(x,spec(18,:))
subplot(3,2,4)
spc=load('augccpvdz.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("aug-ccPVDZ");plot(x,spec(18,:))
subplot(3,2,5)
spc=load('631gppdd.spc');sf=spc(:,1); si=spc(:,2);x=linspace(0,4000,800);spec=cumsum(gauss(x,sf,si,75));
title("631++g**"); plot(x,spec(18,:))

From top to bottom: Left: 3-21G, 6-31G, 6-31++G**. Right: cc-pVDZ, aug-cc-pVDZ
5. Conclusions
It may seem weird that as a test case I picked a species I don't have any reference potential for. However, the goal here was to understand how the basis set affects the results, without being distracted by such things as Real Life.

The observed spectra can be divided into two group: 3-21G/6-31G vs 6-31++G**/cc-pVDZ/aug-cc-pVDZ. Polarization (and diffuse functions) seem to play a large role.

In terms of thermochemistry, not surprisingly aug-cc-pVDZ and 6-31++G** give very similar results since they both implement pol/diff functions. The computational cost is, however, significantly higher for aug-cc-pVDZ than 6-31++G**, at least in nwchem.

There is also little difference between doing freq calculations in gas phase vs using cosmo when it comes to the calculated redox potential for the more extensive basis sets.

3-21G gives very varying results, with it giving the highest potential in the gas phase but the second lowest potential with cosmo. cc-pVDZ consistently gives the lowest potential.

UHF/ROHF/HF are fast, but wildly inaccurate. LANL2DZ/6-31+G* looks ok, results-wise, but the thermodynamic corrections are actually much smaller in conjunction with COSMO than the other methods, which is suspicious.

If given the time I may post a more detailed analysis of polarisation vs diffuse functions later.

17 June 2012

194. Wine 1.4.1 and Wine 1.5.6 on Debian Wheezy

UPDATE 16 May 2013: See here for Wine 1.5.30: http://verahill.blogspot.com.au/2013/05/416-wine-1530-in-chroot.html

There's great appetite for anything wine-related in Debian, as I can see from visitor numbers, so  here's how to build  Wine 1.4.1 and Wine 1.5.6 in Debian Testing/Wheezy. Enough talking -- let's get compilin'!

The builds take a little while, so be warned. Not all features are enabled in this particular build either -- see the configure output.

If you're interested in the missing development files seen below, this post might help: http://verahill.blogspot.com.au/2012/03/cross-compiling-eg-32-bit-binaries-on.html. Instinctively, I am suspicious as to whether that would work. I haven't explored it though, and my chief motivation is to make build instructions which anyone can easily follow and reproduce.

UPDATE (10th Jan 2013): See here for Wine 1.5.21 using the multiarch approach: http://verahill.blogspot.com.au/2013/01/308-compiling-wine-1521-on-debian.html


For both
sudo apt-get install bison flex gcc libc6-dev libfontconfig-dev libfreetype6-dev libglu-dev libgsm1-dev libice-dev libjpeg-dev libldap-dev libmpg123-dev libncurses5-dev libopenal-dev libpng-dev libsm-dev libssl-dev libusb-dev libx11-dev libxcomposite-dev libxcursor-dev libxext-dev libxi-dev libxinerama-dev libxml2-dev libxrandr-dev libxrender-dev libxslt-dev libxt-dev libxxf86vm-dev make libcapi20-dev liblcms-dev libsane-dev libhal-dev libdbus-1-dev valgrind prelink libcups2-dev opencl-dev lib32opencl1 oss4-dev gettext lib32v4l-dev lib32ncurses5-dev lib32asound2-dev lib32z-dev ia32-libs-dev


Version 1.4.1
wget http://prdownloads.sourceforge.net/wine/wine-1.4.1.tar.bz2

tar xvf wine-1.4.1.tar.bz2
cd wine-1.4.1/
./configure
configure: OpenCL 32-bit development files not found, OpenCL won't be supported.
configure: gstreamer-0.10 base plugins 32-bit development files not found, gstreamer support disabled
configure: libgsm 32-bit development files not found, gsm 06.10 codec won't be supported.
configure: libtiff 32-bit development files not found, TIFF won't be supported.
configure: WARNING: libjpeg 32-bit development files not found, JPEG won't be supported.
configure: Finished.  Do 'make' to compile Wine.
make
sudo checkinstall --install=yes

Note: To just make a .deb package, do ---install=no.

Version 1.5.6
wget http://prdownloads.sourceforge.net/wine/wine-1.5.6.tar.bz2
tar xvf wine-1.5.6.tar.bz2
cd wine-1.5.6/
./configure

configure: OpenCL 32-bit development files not found, OpenCL won't be supported.
configure: libsane 32-bit development files not found, scanners won't be supported.
configure: gstreamer-0.10 base plugins 32-bit development files not found, gstreamer support disabled
configure: libgsm 32-bit development files not found, gsm 06.10 codec won't be supported.
configure: libtiff 32-bit development files not found, TIFF won't be supported.
configure: WARNING: libjpeg 32-bit development files not found, JPEG won't be supported.
configure: Finished.  Do 'make' to compile Wine.
make
sudo checkinstall --install=yes

Note: To just make a .deb package, do ---install=no.

15 June 2012

193. Notes on UV/VIS - TD DFT in nwchem

I've been interested in looking at using NWChem to predict UV/VIS spectra. This post doesn't show how to do it but is rather just a look at what affects the positions of absorbances.
(although it more or less just involves
tddft
       cis
       nroots 24
end
task tddft energy)
I chose benzene and water as test cases. 

Here's water:
water

Reference spectrum in gas phase: http://www.lsbu.ac.uk/water/vibrat.html (166.5 nm) which tallies very well with 3-21g and 6-31g, but not at all with 6-31++g** or 3-21++g*.

Here's benzene:

Reference spectrum: http://www3.wooster.edu/chemistry/is/brubaker/uv/uv_spectrum.html
The reference spectrum, while a bit diffuse, shows the main absorbance at ca 178 nm. Presumably that's in benzene, not water.

More basis sets:


(cat Benzene_631g/Outputs/nwch.nwout |egrep "Root|sci"|gawk '{print $4,$6}'>bz631g.dat)

192. Skype on Debian STABLE (updated for Skype 4.x)

Update 15/6/12: New Skype (4) out this morning. Screengrabs etc posted at the end.

This post came about from a question posted on the Debian forums.

Before you read, be aware of this:
* I don't have debian 6.05 installed on any physical system. They all run debian wheezy
* This guide was done in a virtualbox installation. For sound, I used a headset and USB-passthrough. 

Having said that, there's no reason this shouldn't work.

As my system, I used the same one I used here (i.e. a very slim install):

The advantage is that it's likely to a lot more barebones than a regular desktop debian install.
The disadvantage is that I don't know what's pulled in by default by debian 6.05.

Anyway, here's what I did:

1. Download and install  skype from skype.com
I selected the Debian 5 64 bit package which is gives you the skype-debian_2.2.0.35-1_amd64.deb file.

Install:
sudo dpkg -i skype-debian_2.2.0.35-1_amd64.deb 

Selecting previously deselected package skype.
(Reading database ... 53724 files and directories currently installed.)
Unpacking skype (from skype-debian_2.2.0.35-1_amd64.deb) ...
dpkg: dependency problems prevent configuration of skype:
 skype depends on lib32stdc++6 (>= 4.1.1-21); however:
  Package lib32stdc++6 is not installed.
 skype depends on lib32asound2 (>> 1.0.14); however:
  Package lib32asound2 is not installed.
 skype depends on ia32-libs; however:
  Package ia32-libs is not installed.
 skype depends on lib32gcc1 (>= 1:4.1.1-21+ia32.libs.1.19); however:
  Package lib32gcc1 is not installed.
 skype depends on ia32-libs-gtk; however:
  Package ia32-libs-gtk is not installed.
dpkg: error processing skype (--install):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 skype
That's fine. Now fix the dependencies:
sudo apt-get install -f
Reading package lists... Done
Building dependency tree    
Reading state information... Done
Correcting dependencies... Done
The following extra packages will be installed:
  ia32-libs ia32-libs-gtk lib32asound2 lib32bz2-1.0 lib32gcc1 lib32ncurses5
  lib32stdc++6 lib32v4l-0 lib32z1 libv4l-0
Suggested packages:
  lib32asound2-plugins
The following NEW packages will be installed:
  ia32-libs ia32-libs-gtk lib32asound2 lib32bz2-1.0 lib32gcc1 lib32ncurses5
  lib32stdc++6 lib32v4l-0 lib32z1 libv4l-0
0 upgraded, 10 newly installed, 0 to remove and 0 not upgraded.
1 not fully installed or removed.
Need to get 50.1 MB of archives.
After this operation, 123 MB of additional disk space will be used.
Do you want to continue [Y/n]? 

2. Get sounds organised
aptitude search pulseaudio|grep ^i
i A libpulse0                       - PulseAudio client libraries 
Not enough. Need. More. Packages.

sudo apt-get install pulseaudio pulseaudio-esound-compat pulseaudio-module-gconf
sudo apt-get install gnome-core

Obviously, if you have gnome installed, skip the second line.

Sort out your ~/.asoundrc file
echo "pcm.!default.type pulse">>~/.asoundrc
echo "ctl.!default.type pulse">>~/.asoundrc

At this point I rebooted for good luck.

3. Putting it all together

First open your volume control to see that it 'looks right'

Start Skype
Go to options (click on the S at the bottom left).  Make sure it says pulseaudio.

Make a test sound. Make a test call. Make sure to select the correct outputs and inputs in the gnome volume control

This worked perfectly for me and took all in all ca 25 minutes with screenshots and all.

Update 15/6/2012:
Skype 4 came out today -- I downloaded and installed it and tested it in the virtual machine above. Everything works perfectly.






Links to this post:
http://crunchbang.org/forums/viewtopic.php?id=27451

14 June 2012

191. Thinking about Molecular volume -- and is cosmo/nwchem yielding the right ones?

Disclaimer:
I'm an neither a theoretical nor computational chemist. I'm an analytical/inorganic chemist who likes computers. Don't trust my conclusions. This is more like thinking aloud.

The problem:
The underlying impetus is that of molecular volume: if we have a set of scatter points in space which define the surface of a molecule, how can we extract the volume? In particular as we're actually given the surface points by in the form of a cosmo.xyz file by COSMO (and yes, nwchem also outputs a volume - more about that later) there's no reason why we won't do the calculations ourselves. Also, there's at least one example of comparing results from a few major software packages, where nwchem was the odd one out.

Because it's good to know how to use Octave and bash, I'll show the commands as well.

The COSMO parameters used were
cosmo
    rsolv 0
end

[come to think of it: why bother with
and nwchem returned

 atomic radii =
 --------------
    1  6.000  2.000
    2  6.000  2.000
    3  6.000  2.000
    4  6.000  2.000
    5  6.000  2.000
    6  6.000  2.000
    7  1.000  1.300
    8  1.000  1.300
    9  1.000  1.300
   10  1.000  1.300
   11  1.000  1.300
   12  1.000  1.300
and a volume of ca 74.5 Å3

Processing:
me@Be:~$ head cosmo.xyz 
                  325
 cosmo charges
 Bq   2.1848085582473193      -0.38055253987610238        1.5251498369435705       -9.3089382062078174E-004
 Bq   1.6134835908159706      -0.59877925881345084        1.8782480854375714       -3.3706153046646758E-003
 Bq  0.43449121346899733      -0.59877925881345084        1.8782480854375714       -3.9739778624157118E-003
 Bq   1.0239874021424840      -0.23823332776127137        1.8683447179254316       -1.6433149723942275E-003


OK, we need to remove the first two lines, and the first column.
me@Be:~$ tail -n +3 cosmo.xyz|gawk '{print $2,$3,$4,$5}'> cos2.xyz
Start octave.
octave:1> bz=load('cos2.xyz');
octave:2> x=bz(:,1);y=bz(:,2);z=bz(:,3);c=bz(:,4);
octave:3> plot3(x,y,z)

Paradoxically, this would be fairly easy to do with a 'normal-size' physical object (e.g. water displacement, or area on a 2D project: draw it, cut it out, weigh it and use the density of the paper)

 Computationally, we need to think about it though. The most logical approach seems to be to take all x,y data points with a small range of values of z=zi±dz, project them on a 2D surface, calculate the area within, and multiply it by dz. Do this for all values of z.
octave:4> plot(y,z,'*')


But how to calculate the area inside an arbitrary two-dimensional figure then? If we can pick a point in the 'centre' of the figure, we can draw repeated triangles with this point as one of the corners. It's easy to calculate the area of a triangle, so we just need to sum the areas of the triangles. All we need to know is how to find such a central point to use as a corner. Also, there are problems when dz is too large and the 'border' becomes fuzzy.
octave:5> plot(y(1:25),z(1:25),'*')

In fact, at this stage there may well be pre-canned algorithms to help us.
octave:6>H=convhull(y(1:25),z(1:25));
octave:7>plot(y(H),z(H))
octave:8>hold
octave:9>plot(y(1:25),z(1:25),'*')

That way we can reduce the number of points to the ones defining the encircling figure.
octave:10>area(y(H),z(H))


That still doesn't give us the area (I think matlab does though). Since it's centred around the x axis we could probably use cumsum(abs(z(H))), but that's not general enough. In fact, there'd be so much  quality analysis required in order to make sure that we include enough, but not too many, points in our slices that it quickly becomes a chore.

So we'll take a step back. Turns out it's even easier:
octave:11>[H V]=convhulln([x y z]);
This probably isn't how you're supposed to plot it, but it works:
octave:13>trisurf(H,x,y,z)

trisurf plot
octave:12>V
gives V=104.07  Å3 (c.f. Nwchem/cosmo ca 74.5 Å3 for rsolv=0.)

Now that doesn't look good, but it has been noted nwchem/cosmo gives volumes which are about half of what every other program gives. See here and here:

">Cosmo produced volumes, which were twice as small
> as those obtained by PCM, while surfaces where by about 15% bigger in
> Cosmo."

I think nwchem actually isn't returning values of the wrong magnitude -- I think the value returned by nwchem is the molecular volume, while the other programmes return the solvent accessible surface-based volume. But what is in cosmo.xyz?

It appears to be a little bit more complex than that though.


We can open the cosmo.xyz file in jmol, but calculating the volume from these would be meaningless due to the way jmol works.

Instead we'll have to use the VdW radii of the xyz coordinates of the (unoptimised) molecule:


$ isosurface sasurface 0.5 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = 141.06999
$ isosurface sasurface 0.225 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume =104.452415
$ isosurface solvent 0 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = 79.09731
$ isosurface solvent volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = [80.26721490808025]
$ isosurface molecular volume
isosurface2 created with cutoff=0.0; isosurface count: 2
isosurfaceVolume = [80.58888982478977]
$ isosurface sasurface 0.2 area
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceArea = 118.730934
Making sense?

sasurface generates a solvent accessible surface. We can generate a value similar to what we saw from the cosmo.xyz points by forcing the sasurface probe radius..

The vdw radii of H and C are 1.2 and 1.7 Å, but COSMO uses 1.3 and 2.0.

Look at this plot again:


The height goes from -2 to 2, which agrees with the large 2.0 Å VDW radius for C that COSMO uses. The volume outputted by Nwchem is the molecular volume (as actually is stated). 
 number of -cosmo- surface points =      176
 molecular surface =    125.008 angstrom**2
 molecular volume  =     74.512 angstrom**3
(electrostatic) solvation energy =         0.0052128678 (    3.27 kcal/mol)
The molecular volume for rsolv=0 is 74.5 Å3 which is fairly close to isosurface sasurface 0 volume. Area is trickier, and requires isosurface sasurface 0.23 volume to yield anything close.

I don't think it's a coincidence that isosurface sasurface 0.225 volume gives a reasonable agreement with the cosmo.xyz, since 1.7+0.225=1.925 which is ca 2 (we only add 0.1 for H).

I'm sure all this is in the manual somewhere. But there's nothing like learning through doing.

The conclusions:
* NWchem returns a volume based on the vdw radii, not the solvent cavity
* cosmo.xyz contains points defining the surface according to the vdw radii that cosmo uses
* These are two different sets of vdw radii
* You can't compare the output of different software packages if they aren't outputting the same data
* The reported NWChem volume does depend on rsolv, the cosmo vol doesn't
* The cosmo.xyz volume is insensitive to rsolv, but sensitive to radius as expected. As far as I understand, the cosmo volumes are based solely on the vdw radii (as supplied to cosmo)
* I haven't quite figured out how, but looking at the dependency of rsolve vs defining vdw radii for cosmo, the radii used to calculate the nwchem volume is is certainly affected.

Increase rsolv=0.0, increase vdw +0.0: 74.51/104.07/3.27
Increase rsolv=0.5, increase vdw +0.0: 58.0/103.96/3.01
Increase rsolv=1.0, increase vdw +0.0: 54 /103.87/2.95
Increase rsolv=0.0, increase vdw +0.1: 84.79/115.10/2.72
Increase rsolv=0.1, increase vdw +0.1: 82.68/115.10/2.63
Increase rsolv=0.27, increase vdw +0.1: 71.84/114.97/2.56
Increase rsolv=0.0, increase vdw +0.2: 96.59/126.83/2.22
Increase rsolv=0.1, increase vdw +0.2: 85.70/126.68/2.09
increase rsolv=0.70, increase vdw +0.2: 74.68/126.56/2.01

My only real conclusion at this point is that you have to be extremely careful about what you do. This is not easy.


A Certain Commercial Programme (ACCP):
Using pcm:

scrf=(pcm,solvent=water) -- this uses vdw radii.
GePol: Cavity volume                                =      134.665 Ang**3
GePol: Cavity surface area                          =    143.132 Ang**2
Let's see if we can do this in jmol:
$ isosurface sasurface 0.5 area
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceArea = 144.25595
$ isosurface sasurface 0.46 volume
isosurface1 created with cutoff=0.0; isosurface count: 1
isosurfaceVolume = 135.33589
PCM is less of a mystery now.

ACCP has a few more options though.
Using IPCM with 50 points. This uses the isodensity volume.
Volume of Solute Cavity = 8.026500E+02
Total "Solvent Accessible Surface Area" of Solute = 4.485628E+02
I've been told that the units are in Bohr3 and Bohr2. That translates to 118.94 Å3 and 125.61Å3, respectively, which sounds about right. 

13 June 2012

190. In deep water: NWChem and COSMO

This post is based entirely on empirical experience. I don't claim to know what I'm doing. Right now I'm just looking at performance.

To actually learn more about COSMO (implemented) and COSMO-RS (not implemented), read the following article by the creator of the methods: http://onlinelibrary.wiley.com/doi/10.1002/wcms.56/abstract

Anyway.


As always, the test job (benzene at b3lyp/g-31+g*) is very short, so the error margin is large. A major impetus for this is the execeptional performance of PCM in gaussian, and seemingly poor performance of nwchem using standard settings. When several numbers are quoted they come from multiple runs

Task energy - empty COSMO block:
0. Gas phase - ca 40 s
1. From scratch. Empty cosmo block - 79 s
2. Loaded movecs from gas phase, empty cosmo block - 48 s, 65 s, 65 s

The default is water and rsolv=0

COSMO parameters
Movecs loaded in all cases. Solvation energies in []

Task energy -- rsolv
0. rsolv=0 - [3.27 kcal/mol] - 48 s, 65 s, 65 s, 65 s
1. rsolv=0.5 - [3.01 kcal/mol] - 58 s, 58 s
2. rsolv=1 - [2.95 kcal/mol] - 57 s, 57 s, 58 s
3. rsolv=2 - [2.62 kcal/mol] - 55 s

The molecular volumes obtained are 74.5, 58.0 and 54.0 Å3, respectively, for r=0..1. My next post will talk about what this actually means, but in short, this has nothing to do with the solvent/solute cavity.

Task energy -- lineq
0. rsolv=0.5; lineq 0 - [3.01 kcal/mol] - 58 s, 58 s, 56 s
1. rsolv=0.5; lineq 1 - [3.01 kcal/mol] -  58 s

Task energy -- ificos
0. rsolv=0.5; lineq 0, ificos=0 - [3.01 kcal/mol] - 58 s, 58 s, 56 s
1. rsolv=0.5; lineq 0, ificos=1 - [3.01 kcal/mol] - 62 s

1 (one) kcal./mol = 4.184 kJ/mol -- there's thus a fairly wide range of values obtained above depending on the absolute settings.

Rsolve defines the probe used to find the solvent accessible surface -- the smaller the value, the more fine-grained and larger the apparent accessible surface. We would expect that a fairly small number is preferable for rsolv.

Ultimately, I don't see any obvious way of improving performance, other than using large values for rsolv.

An interesting feature is that the surface used by COSMO is save in a cosmo.xyz file in the runtime directory -- all that remains is working out a way of calculating the volume from this (I know it's reported in the nwchem output, but it never hurts being paranoid)