Showing posts with label freq. Show all posts
Showing posts with label freq. Show all posts

09 April 2014

570. Briefly: restarting a g09 frequency job with SGE, using same queue

I've had g09 frequency jobs die on me, and in g09 analytical frequency jobs can only be restarted using the .rwf. Because the .rwf files are 160 gb, I don't want to be copying them back and forth between nodes. It's easier then to simply make sure that the restarted job is run on the same node as the original job.

A good resource for SGE related stuff is

Either way, first figure out what node the job ran on. Assuming that the job number was 445:
qacct -j 445|grep hostname
hostname compute-0-6.local

Next figure out the PID, as this is used to name the Gau-[PID].rwf file:
grep PID g03.g03out
Entering Link 1 = /share/apps/gaussian/g09/l1.exe PID= 24286.

You can now craft your restart file, g09_freq.restart -- you'll need to make sure that the paths are appropriate for your system:
%nprocshared=8 %Mem=900000000 %rwf=/scratch/Gau-24286.rwf %Chk=/home/me/jobs/testing/delta_631gplusstar-freq/delta_631gplusstar-freq.chk #P restart
(having empty lines at the end of the file is important) and a qsub file, g09_freq.qsub:
#$ -S /bin/sh #$ -cwd #$ -l h_rt=999:30:00 #$ -l h_vmem=8G #$ -j y #$ -pe orte 8 export GAUSS_SCRDIR=/tmp export GAUSS_EXEDIR=/share/apps/gaussian/g09/bsd:/share/apps/gaussian/g09/local:/share/apps/gaussian/g09/extras:/share/apps/gaussian/g09 /share/apps/gaussian/g09/g09 g09_freq.restart > g09_freq.out
Then submit it to the correct queue by doing
qsub -q all.q@compute-0-6.local g09_freq.qsub

The output goes to g09_freq.log. You know if the restart worked properly if it says
Skip MakeAB in pass 1 during restart.
Resume CPHF with iteration 214.

Note that restarting analytical frequency jobs in g09 can be a hit and miss affair. Jobs that run out of time are easy to restart, and some jobs that die silently have also been restarted successfully. On the other hand, a job that died because my resource allocations ran out couldn't be restarted i.e. restart started the freq job from scratch. The same happened with one a node of mine that has what seems like a dodgy PSU. Finally, I also couldn't restart jobs that died silently due to allocation all the RAM to g09 without leaving any to the OS (or at least that's the current best theory). It may thus be a good idea to back up the rwf file every now and again, in spite of the unwieldy size.

25 June 2012

200. How long will your nwchem frequency calc take?

Update 19/12/12: Having done a lot more frequency calculations since I posted this I sincerely doubt that this approach works.

Original post
Because I'm stuck waiting for the results of frequency calcs on some large transition metal clusters, I've become interested in understanding the output of frequency calculations in progress. After all, why wait 15 days for a results if there are early signs that the calculation has gone haywire?

Also, it might just be me, but frequency calculations are not that easy to restart, so you want to make sure that you give them enough wall time to finish if you use a queue manager.

I'm sure most of this could be appreciated by RTFM, but who has time for that?

So this is what the calc does:
After the usual boredom of reading in the geometry and doing an energy calculation, followed by an MO analysis, the computational fun starts.

Each cycle contains the following reports:

  1. Total Density - Mulliken Population Analysis
  2. Spin Density - Mulliken Population Analysis
  3. Total Density - Lowdin Population Analysis
  4. Spin Density - Lowdin Population Analysis
  5. Expectation value of S2:  
  6. NWChem DFT Module
  7.   Caching 1-el integrals 
  8.       Total Density - Mulliken Population Analysis
  9.       Spin Density - Mulliken Population Analysis
with the exception of the first cycle, which also look at the alpha-beta orbital overlaps, the centre of mass, moments of inertia and does a multipole analysis of density and save an initial hessian.

Each cycle ends with a report of the energy for that vibration:

         Total DFT energy =    -3297.032399945703
      One electron energy =   -26618.764098759657
           Coulomb energy =    12938.745973154924
    Exchange-Corr. energy =     -382.742230868704
 Nuclear repulsion energy =    10765.727956527733

 Numeric. integr. density =      441.999974968347

     Total iterative time =   7947.4s

If you do cat nwch.nwout|egrep "Total iterative time|Total DFT energy" you can see the progress:
        Total DFT energy =    -3297.032416366805
     Total iterative time =  12146.0s
         Total DFT energy =    -3297.032399945703
     Total iterative time =   7947.4s
         Total DFT energy =    -3297.032399544749
     Total iterative time =   7946.0s
         Total DFT energy =    -3297.032406934719
     Total iterative time =   7945.8s
         Total DFT energy =    -3297.032405026814

You now have an idea of how long each step takes. But how many steps in total? I think it's 3N steps, where N is the number of atoms.

For my 50 atoms POM using the values above it'd be roughly 8000 s * 150 = 13 days 22 hours.

Which seems about right...

cat nwch.nwout|grep 'Total DFT'|gawk 'END {print NR}'
so I've got another 8 days before I can get my hand on some juicy thermochemical data...

Time to start preparing lectures...