I've spent the past few days trying to get to grips with the Sun Gridengine (SGE) but have given up for now. While it seems capable, it's just overkill for my purposes, especially taking into account the difficulties in simply configuring it. It's a bit similar to my experience with OpenDX, a very capable plotting program, but which I couldn't make work to satisfaction in spite of being one of the lucky few in possession of the "Open DX -- Paths to Visualisation" book.
Long story short -- I wrote a small script in python. It
- reads a file, list, with the name of shell scripts
- the shell scripts, job1.sh..jobn.sh, are executed sequentially - when the execution of one script is finished, the next one is executed
- jobs can be added and removed from list during execution
It's a 'dumb' script -- it does not try to balance jobs across nodes or look for idle cpus/cores. It just executes one job after the other, and mark jobs as done after execution.
To test it:
create a file called list and put the following lines in it:
pi40.shThe scripts are the following:
echo "pi to 40 decimals"pi400.sh
echo "scale=40; 4*a(1)" | bc -l -q
echo "scale=400; 4*a(1)" | bc -l -qpi200.sh
echo "scale=2000; 4*a(1)" | bc -l -qThe python code for vspqm.py is below
I've aliased my vspqm (edit ~/.bashrc):
alias vspqm='/home/me/work/vspqm/vspqm.py'Then sourced ~/.bashrc
Launch in the directory you keep your list file using
me@beryllium:~/work/vspqm/jobs$ vspqm list > log &
me@beryllium:~/work/vspqm/jobs$ cat log
pi to 40 decimals
An nwchem example would be
mpirun -n 4 nwchem ac.nw>ac.out
mpirun -n 4 nwchem bn.nw>bn.out
Our python queue manager (which we'll call vspqm.py and chmod +x to make executable) is below. Don't forget to change #!/usr/bin/python2.4 if necessary -- I use 2.4 on ROCKS and 2.7 on Debian testing/wheezy
# rudimentary queue manager. Handles a single node,
# submitting a series of jobs in sequence. use python v2.4-2.7
print "pyqm v 0.0.3"
print "Job successful"
print "Job failed"
for i in bakfile:
for i in bakfile:
for line in qfile:
print "Marked as done: ",line[1:]
if line!="*" and job=="":
print "Launching: ", line
print "No more jobs found at "+str(time.asctime())
if __name__ == "__main__":