Showing posts with label chemistry. Show all posts
Showing posts with label chemistry. Show all posts

15 July 2013

475. How to get into a Chemistry PhD program in Australia -- or at least a reply from a prospective supervisor

Here's yet another non-linux post. I'm currently getting ready for the start of the new semester and teaching, and so haven't had much time to work on improving my computer skills.

Anyway.

I've been advertising for an international PhD student for the past 9 months and have so far only had one great applicant and three acceptable applicants. That's out of ca 200 applicants in total.

So what does 'acceptable' mean? In this case my use actually agrees with the literal meaning -- students which will stand a chance of being accepted to the PhD program. It also means students which I could imagine working with.

The formal requirements will likely differ between different institutions, and between supervisors. In addition, some supervisors may be looking for different personalities in their prospective hires, than others.

I don't think that I'm being unnecessarily harsh in evaluating applicants, as I've had colleagues review my shortlists and who have thought I've even been a bit too optimistic in my evaluations.

At any rate, if you are looking for a PhD, be aware that there are a lot of applicants out there, and only a limited amount of money and places, so you will want to spend some time on your application.

So here are a few of my thoughts:
Before reading, keep in mind that I understand that applying for a PhD, especially if you are from the developing world and applying for a PhD position in the industrialised world, can be very tough, and sometimes depressing. You don't receive a reply to most of your applications, and when you do, they responses are normally negative.


* Try to familiarise yourself with the formal requirements, and address them in the first paragraph in your email to a prospective supervisor. In the case of my uni, there are two main requirements:
-- an undergraduate degree equivalent to a first class honour degree in Australia
-- a sufficiently good score at the IELTS

That's it. However, the hubris of many universities in Australia mean that the first requirement is a significant hurdle. Typically, good grades are just the beginning. In addition to that, the applicant needs to hold a masters degree (by research) and have a couple of papers in ISI rated journals. Obviously almost none of our own undergraduate students would meet that, but there you go.

So in your first paragraph, state what unis you did your degrees at, what your cumulative GPAs (or equivalent) were, how many papers you have published and what you overall band score AND section scores on the IELTS (or TOELF) are.

At this stage, that's much more important than your background, your hobbies, or anything else. If you can't meet the minimum requirements for entry to the PhD program, everything else doesn't matter.

* Read the advertisement, and follow any instructions
I ask applicants to submit all their documents as PDFs. Yet, I get plenty of applications with .doc, .docx, jpeg etc attached. You didn't read the instructions -- will you be more careful as a PhD student? Remember that you competing against plenty of applicants that did read the instructions.

Did I ask for your IELTS results? Didn't attach or mention them in your email/CV? Not a good sign. Also, it means that you're probably not a candidate.

* Address the supervisor and the supervisor's research
I get way, way too many emails that start with  'Dear Sir', or 'Dear Professor' or even worse: 'Dear Sir/Madam'. Put my name in there. It'll show me that you spent at least a few minutes on personalising your email. If you don't make that effort, why should I make the effort of reading your email and looking at your documents?

Also, please do mention the research of the supervisor you are applying to. It doesn't need to be anything insightful or special, but just write something like: 'I find your research into catalytic activation of molecules in ionic liquids very interesting.' or 'I read your article in Green Chemistry, 2013, 10, 2345 and found it very interesting. In particular, I liked how it showed how the selectivity of blah blah blah'.

The reason is not that you are showing off your great scientific skills (you've got an undergraduate degree -- we don't expect much), but that it shows you spent a bit of effort writing your email and personalising it. Also, flatter -- in moderation -- can occasionally help (don't go overboard, so be careful -- too much makes you seem insincere).

* Don't cold-call
This should go without saying. I've had one student email me in the morning, then call me in the afternoon. That kind of behaviour is probably correct if you are applying for certain jobs in the Real World (marketing?), but not for a PhD in chemistry. It's a sure-fire way of annoying people.

* Don't send a linked-in invite
I don't have time to scroll through your profile and try to compile a CV for you. Send me your CV in pdf format instead. Also, I don't know you, and have no incentive to add you to my 'network'.

* Be careful about 'hobbies' and 'interests'.
To me as a potential supervisor they really don't matter (again, this is my personal opinion). I know that the idea is to show that you are a well-rounded individual, but knowing that you like 'travel' or that you consider 'internet browsing' a skill will not be the edge that gets you into a PhD programme.

* 'It can't help you, only harm you'.
Keep this in mind. Unless it's a piece of information required in the advertisement, or that you are absolutely certain will help your application, consider leaving it out. You may include it to highlight a particular skill or trait, but remember that a CV can be interpreted ambiguously, and your intent may not be obvious. Instead, what you feel shows how independent and committed you are, can be seen as being unfocussed, a difficult person to work with, or simply attract attention away from more important aspects of your CV.

* Attending lectures, conferences
In their CVs, some applicants include lectures by famous people that they've attended, or conferences that they've gone to.

Here's the problem for me: most first year PhD students struggle with the notion that doing the work is no longer enough. Doing the experiments, or following your supervisor's instructions, is not enough. To get a PhD you need to make that extra effort and making things work. And if it doesn't work, you put in 150% effort -- the extra 50% being extra-curricular work on finding a related project that will work. Life as a PhD student can be easy if you are lucky, but most often is not -- life is incredibly good when you project is working, but on the flip-side it can be hard, depressing and demoralising when it isn't. You supervisor can alleviate some of that, but remember that your supervisor is only there to point you in a general direction -- the PhD is all about making the transition to becoming an INDEPENDENT research.

So be careful -- if you've presented posters or given talks at conferences or at other universities, you should definitely list them, but under a suitable heading -- NOT publications. They'll detract attention from the publications, and the publications is what will get you an offer of acceptance.

* Do not make things conditional
I had an applicant who was borderline (in terms of meeting the requirements), and in those cases occasionally the supervisor putting in extra effort into cajoling the university administration MAY be enough to get a student accepted (don't count on it). If your prospective supervisor asks you to re-take IELTS, don't write something along the lines of  'I will, but only if this is the last hurdle'.

I understand it's expensive, but remember: even if you meet all the requirements I cannot guarantee that you get accepted. And I can' wait months for each student to pass through the application system -- I need to hire someone now. So be proactive.

* Face-to-face (or skype/video) interview is a good sign
If your supervisor asks for a skype interview, this is a great sign. And likely this isn't really done in order to gauge your scientific skills, but just to get a feel for your personality. Also, it's a way of making sure that your English levels are good enough that you can communicate with your supervisor. Finally, if you are borderline in terms of IELTS/TOELF, your supervisor may be able to argue that you English is good enough based on that interview. So take the opportunity.

And send an email a few hours after the interview thanking for the opportunity. 1-2 lines is enough. It will show that you're a decent human being.

* Be prompt in replying to emails
It doesn't matter what stage of the application you are at -- until the paperwork has been signed you are still on probation. If you take several days to reply to any of my emails, then you are likely to be dropped. The reason is simple: if you take a week to get things done when you are a PhD student, then you will be a disaster for me. A disaster that I'll have to live with for the next 3-4 years, and whom will be using up my research grant, and potentially ruining my career.

I understand that the reason for you being slow may be different -- maybe you are just nervous, maybe you have nothing to say, maybe you feel you are intruding. Still, be prompt.

So:
if you can show that you can read and follow instructions, and if you can make my life easy by addressing the selection criteria in a clear way, and if you seem like a person I might enjoy working with for the next 3-4 years, then you stand a fair chance of getting an offer.

If I think you'll need constant supervision, is sloppy and won't follow instructions, or that our personalities will clash, I'll probably avoid you no matter how good your grades are.


13 January 2013

318. qmol 0.3.2: A molecular weight calculator for Linux

Over a year ago I complained about the lack of a decent molecular weight calculator in linux in general, and in Debian Testing in particular.  I eventually managed to hack together a molecular weight calculator in Python as part of an isotopic pattern calculator in Python.

However, interpreted languages like python tend to be a bit slower than compiled languages (generally not critical for a molecular weight calculator, but could be for an isotopic pattern calculator), and, perhaps more importantly, my scripts don't feature a GUI.

I vaguely remember trying to compile Kmol by Tomislav Gountchev over a year ago, and as far as I can recall it wasn't working out since it depended on packages (kde-3) that were too old.

But things are changing.

Thomas Mitterfellner has revived Kmol in the form of qmol. Since it's very recent (version 0.1 was created in November 2012, and we're now at version 0.3.2) it's not found in the Debian repos, and maybe won't for some time given that Debian Testing/Wheezy is frozen.

There are, however, a pre-built .deb file for debian squeeze/stable (and Suse, Ubuntu, Fedora etc.) -- so if you're on stable you do not have to compile. Instead go here: http://download.opensuse.org/repositories/home:/lineinthesand/

qmol is a fairly complete solution, and importantly is highly configurable while at the same time being straightforward to use. In particular I like the ability to define your own chemical groups AND the ability to run it from the command line. It's basically what I've been waiting for with the exception of the lack of an isotopic pattern calculator -- but that may come by version 1.0.

Also, the documentation -- or qmol handbook -- is quite extensive and is available under help.


Enough talking -- time for compiling.

sudo apt-get install bzip2 build-essential cmake libqtcore4 libqtgui4 qt4-qmake libqt4-dev
mkdir ~/tmp 
cd ~/tmp
wget http://downloads.sourceforge.net/project/qmol/qmol-0.3.2/qmol-0.3.2.tar.bz2
tar xvf qmol-0.3.2.tar.bz2
mkdir buildqmol
cd buildqmol/
cmake ../qmol-0.3.2
make
sudo make install

Don't worry if you get
-- Looking for Q_WS_X11 -- Looking for Q_WS_X11 - found -- Looking for Q_WS_WIN -- Looking for Q_WS_WIN - not found. -- Looking for Q_WS_QWS -- Looking for Q_WS_QWS - not found. -- Looking for Q_WS_MAC -- Looking for Q_WS_MAC - not found.

during the cmake stage.

Usage: Either run qmol from the command line:
qmol 'N(CH3)4'
N(CH3)4 = C4H12N: 74.146 g/mol C 64.80 H 16.31 N 18.89
The output format can be configured when qmol is in gui mode (Edit/Options).

or launch it by typing
qmol
A virgin window

Previous formulae aggregate at the bottom of the  window

It doesn't handle 0, but then neither does my calculator (yet)

It's very easy to define your own group -- but  only the first letter can be  upper case

Options menu -- you can format the command line output here

It works!
There are only two small things to watch out for: the inability to handle 0 (but you get an error message -- my calculator just give you an erroneous result which is arguably worse...) and the requirement that only the first letter in an abbreviation can be upper case (for reasons of ambiguity -- c.f. e.g. CHO vs C, H, O)

08 October 2012

252. Molecular weight calculator in python

Here's the molecular weight part of the isotopic pattern calculator in a previous post.

Most people won't need a full molecular weight calculator with plotting of isotopic composition, so I'm publishing the molecular weight part as a separate program.

The actual algorithm is fairly simple and is more or less contained in the formulaExpander function below. It looks messy because of the definition of the PeriodicTable dictionary at the beginning, but it's simple.

Copy the code, past it into a file (call it e.g. molcalc), put it in e.g. /usr/local/bin and chmod +x it.

Usage:
molcalc 'Mg2(PO4)3'
returns
The mass of Mg2P3O12 is 333.524247 and the calculated charge is -5.
The charge is based on my default oxidation states -- depending on what kind of chemistry you do the oxidation states you encounter are likely to differ.

#!/usr/bin/python2.7
#########################################################################
# Principal author of current version: Me
# Isotopic abundances and masses were copied from Wsearch32.
#
#
# Dependencies:
# To be honest I'm not quite certain. At a minimum you will need python2.7,
# python-numpy
#
#########################################################################

import re #for regular expressions
import sys
from numpy import matrix,transpose # for molw calc
try:
 molecules=sys.argv[1]
except:
 quit()

#slowly changed to IUPAC 1997 isotopic compositions and IUPAC 2007 masses
# see http://pac.iupac.org/publications/pac/pdf/1998/pdf/7001x0217.pdf for
# natural variations in isotopic composition
PeriodicTable ={
   'H':[1,1,[1.0078250321,2.0141017780],[0.999885,0.0001157]], # iupac '97 in water
   'He':[2,0,[3.0160293097,4.0026032497],[0.00000137,0.99999863]], # iupac iso '97
   'Li':[3,1,[6.0151233,7.0160040],[0.0759,0.9241]], # iupac '97
   'Be':[4,2,[9.0121821],[1.0]], # iupac '97
   'B':[5,3,[10.0129370,11.0093055],[0.199,0.801]], # iupac' 97
                        'C':[6,4,[12.0,13.0033548378],[0.9893,0.0107]], # iupac '97
                        'N':[7,5,[14.0030740052,15.0001088984],[0.99632,0.00368]], # iupac '97
                        'O':[8,-2,[15.9949146221,16.99913150,17.9991604],[0.99757,0.00038,0.00205]], # iupac '97
                        'F':[9,-1,[18.99840320],[1.0]], # iupac '97
                        'Ne':[10,0,[19.9924401759,20.99384674,21.99138551],[0.9048,0.0027,0.0925]], # iupac '97 in air
                        'Na':[11,1,[22.98976967],[1.0]], #iupac '97
                        'Mg':[12,2,[23.98504190,24.98583702,25.98259304],[0.7899,0.10,0.1101]], #iupac '97
                        'Al':[13,3,[26.98153844],[1.0]], #iupac '97
                        'Si':[14,4,[27.9769265327,28.97649472,29.97377022],[0.92297,0.046832,0.030872]],#iupac '97
                        'P':[15,5,[30.97376151],[1.0]], #iupac '97
                        'S':[16,-2,[31.97207069,32.97145850,33.96786683,35.96708088],[0.9493,0.0076,0.0429,0.0002]], #iupac '97
                        'Cl':[17,-1,[34.96885271,36.96590260],[0.7578,0.2422]], #iupac '97
                        'Ar':[18,0,[35.96754628,37.9627322,39.962383123],[0.003365,0.000632,0.996003]],#iupac '97 in air
                        'K':[19,1,[38.9637069,39.96399867,40.96182597],[0.932581,0.000117,0.067302]], #iupac '97
                        'Ca':[20,2,[39.9625912,41.9586183,42.9587668,43.9554811,45.9536928,47.952534],[0.96941,0.00647,0.00135,0.02086,0.00004,0.00187]], #iupac '97
                        'Sc':[21,3,[44.9559102],[1.0]], #iupac '97
                        'Ti':[22,4,[45.9526295,46.9517638,47.9479471,48.9478708,49.9447921],[0.0825,0.0744,0.7372,0.0541,0.0518]], #iupac '97
                        'V':[23,5,[49.9471628,50.9439637],[0.00250,0.99750]], #iupac '97
                        'Cr':[24,2,[49.9460496,51.9405119,52.9406538,53.9388849],[0.04345,0.83789,0.09501,0.02365]], #iupac '97
                        'Mn':[25,2,[54.9380496],[1.0]], #iupac '97
                        'Fe':[26,3,[53.9396148,55.9349421,56.9353987,57.9332805],[0.05845,0.91754,0.02119,0.00282]], #iupac '97
                        'Ni':[27,3,[57.9353479,59.9307906,60.9310604,61.9283488,63.9279696],[0.680769,0.262231,0.011399,0.036345,0.009256]], #iupac '97
                        'Co':[28,2,[58.933195],[1.0]], #iupac '97
                        'Cu':[29,2,[62.9296011,64.9277937],[0.6917,0.3083]], #iupac '97
                        'Zn':[30,2,[63.9291466,65.9260368,66.9271309,67.9248476,69.925325],[0.4863,0.2790,0.0410,0.1875,0.0062]], #iupac '97
                        'Ga':[31,3,[68.925581,70.9247050],[0.60108,0.39892]], #iupac '97
                        'Ge':[32,2,[69.9242504,71.9220762,72.9234594,73.9211782,75.9214027],[0.2084,0.2754,0.0773,0.3628,0.0761]], #iupac '97
                        'As':[33,3,[74.9215964],[1.0]], #iupac '97
                        'Se':[34,4,[73.9224766,75.9192141,76.9199146,77.9173095,79.9165218,81.9167000],[0.0089,0.0937,0.0763,0.2377,0.4961,0.0873]], #iupac '97
                        'Br':[35,-1,[78.9183376,80.916291],[0.5069,0.4931]],#iupac '97
                        'Kr':[36,0,[77.920386,79.916378,81.9134846,82.914136,83.911507,85.9106103],[0.0035,0.0228,0.1158,0.1149,0.5700,0.1730]], #iupac '97 in air
                        'Rb':[37,1,[84.9117893,86.9091835],[0.7217,0.2783]], #iupac '97
                        'Sr':[38,2,[83.913425,85.9092624,86.9088793,87.9056143],[0.0056,0.0986,0.0700,0.8258]], #iupac '97
                        'Y': [39,3,[88.9058479],[1.0]], #iupac '97
                        'Zr': [40,4,[89.9047037,90.9056450,91.9050401,93.9063158,95.908276],[0.5145,0.1122,0.1715,0.1738,0.0280]],#iupac '97
                        'Nb':[41,5,[92.9063775],[1.0]], #iupac '97
                        'Mo':[42,6,[91.906810,93.9050876,94.9058415,95.9046789,96.9060210,97.9054078,99.907477],[0.1484,0.0925,0.1592,0.1668,0.0955,0.2413,0.0963]], #checked, iupac '97
                        'Tc': [43,2,[96.906365,97.907216,98.9062546],[1.0]], #no natural abundance
                        'Ru': [44,3,[95.907598,97.905287,98.9059393,99.9042197,100.9055822,101.9043495,103.905430],[0.0554,0.0187,0.1276,0.1260,0.1706,0.3155,0.1862]], #iupac '97
                        'Rh':[45,2,[102.905504],[1.0]], #iupac '97
                        'Pd':[46,2,[101.905608,103.904035,104.905084,105.903483,107.903894,109.905152],[0.0102,0.1114,0.2233,0.2733,0.2646,0.1172]], #iupac '97
                        'Ag':[47,1,[106.905093,108.904756],[0.51839,0.48161]], #iupac '97
                        'Cd':[48,2,[105.906458,107.904183,109.903006,110.904182,111.9027572,112.9044009,113.9033581,115.904755],[0.0125,0.0089,0.1249,0.1280,0.2413,0.1222,0.2873,0.0749]],#iupac '97
                        'In':[49,3,[112.904061,114.903878],[0.0429,0.9571]], #iupac '97
                        'Sn':[50,4,[111.904821,113.902782,114.903346,115.901744,116.902954,117.901606,118.903309,119.9021966,121.9034401,123.9052746],[0.0097,0.0066,0.0034,0.1454,0.0768,0.2422,0.0859,0.3258,0.0463,0.0579]], #iupac '97
                        'Sb':[51,3,[120.9038180,122.9042157],[0.5721,0.4279]], #iupac '97
                        'Te':[52,4,[119.904020,121.9030471,122.9042730,123.9028195,124.9044247,125.9033055,127.9044614,129.9062228],[0.0009,0.0255,0.0089,0.0474,0.0707,0.1884,0.3174,0.3408]],#iupac '97
                        'I':[53,-1,[126.904468],[1.0]], #iupac '97
                        'Xe':[54,0,[123.9058958,125.904269,127.9035304,128.9047795,129.9035079,130.9050819,131.9041545,133.9053945,135.907220],[0.0009,0.0009,0.0192,0.2644,0.0408,0.2118,0.2689,0.1044,0.0887]], #iupac '97
                        'Cs':[55,1,[132.905447],[1.0]], #iupac '97
   'Ba':[56,2,[129.906310,131.905056,133.904503,134.905683,135.904570,136.905821,137.905241],[0.00106,0.00101,0.02417,0.06592,0.07854,0.11232,0.71698]], #iupac '97
   'La':[57,3,[137.907107,138.906348],[0.00090,0.99910]],#iupac '97
   'Ce':[58,3,[135.907140,137.905986,139.905434,141.909240],[0.00185,0.00251,0.88450,0.11114]],#iupac '97
                        'Pr':[59,3,[140.907648],[1.0]], #iupac '97
   'Nd':[60,3,[141.907719,142.909810,143.910083,144.912569,145.913112,147.916889,149.920887],[0.272,0.122,0.238,0.083,0.172,0.057,0.056]],#iupac '97
   'Pm':[61,3,[144.91270],[1.0]], #no natural occurence
   'Sm':[62,3,[143.911995,146.914893,147.914818,148.917180,149.917271,151.919728,153.922205],[0.0307,0.1499,0.1124,0.1382,0.0738,0.2675,0.2275]], #iupac '97
   'Eu':[63,3,[150.919846,152.921226],[0.4781,0.5219]], #iupac '97
   'Gd':[64,3,[151.919788,153.920862,154.922619,155.922120,156.923957,157.924101,159.927051],[0.0020,0.0218,0.1480,0.2047,0.1565,0.2484,0.2186]],#iupac '97
                        'Tb':[65,4,[158.925343],[1.0]], #iupac '97
   'Dy':[66,3,[155.924278,157.924405,159.925194,160.926930,161.926795,162.928728,163.929171],[0.0006,0.0010,0.0234,0.1891,0.2551,0.2490,0.2818]], #iupac '97
   'Ho':[67,3,[164.930319],[1.0]], #iupac '97
   'Er':[68,3,[161.928775,163.929197,165.930290,166.932045,167.932368,169.935460],[0.0014,0.0161,0.3361,0.2293,0.2678,0.1493]], #iupac '97
   'Tm':[69,3,[168.934211],[1.0]], #iupac '97
                        'Yb':[70,3,[167.933894,169.934759,170.936322,171.9363777,172.9382068,173.9388581,175.942568],[0.0013,0.0304,0.1428,0.2183,0.1613,0.3183,0.1276]], #iupac '97
   'Lu':[71,3,[174.9407679,175.9426824],[0.9741,0.0259]],#iupac '97
   'Hf':[72,4,[173.940040,175.9414018,176.9432200,177.9436977,178.9458151,179.9465488],[0.0016,0.0526,0.1860,0.2728,0.1362,0.3508]], #iupac '97
   'Ta':[73,5,[179.947466,180.947996],[0.00012,0.99988]], #iupac '97
   'W':[74,6,[179.946704,181.9482042,182.9502230,183.9509312,185.9543641],[0.0012,0.2650,0.1431,0.3064,0.2843]], #iupac  '97
                        'Re':[75,2,[184.9529557,186.9557508],[0.3740,0.6260]],#iupac '97
   'Os':[76,4,[183.952491,185.953838,186.9557479,187.9558360,188.9581449,189.958445,191.961479],[0.0002,0.0159,0.0196,0.1324,0.1615,0.2626,0.4078]],#iupac '97
   'Ir':[77,4,[190.960591,192.962924],[0.373,0.627]], #iupac '97
   'Pt':[78,4,[189.959930,191.961035,193.962664,194.964774,195.964935,197.967876],[0.00014,0.00782,0.32967,0.33832,0.25242,0.07163]],#iupac '97
   'Au':[79,3,[196.966552],[1.0]], #iupac '97
                        'Hg':[80,2,[195.965815,197.966752,198.968262,199.968309,200.970285,201.970626,203.973476],[0.0015,0.0997,0.1687,0.2310,0.1318,0.2986,0.0687]], #iupac '97
   'Tl':[81,1,[202.972329,204.974412],[0.29524,0.70476]], #iupac '97
   'Pb':[82,2,[203.973029,205.974449,206.975881,207.976636],[0.014,0.241,0.221,0.524]],#
   'Bi':[83,3,[208.980383],[1.0]], #iupac '97
   'Po':[84,4,[209.0],[1.0]],
   'At':[85,7,[210.0],[1.0]],
                        'Rn':[86,0,[220.0],[1.0]],
   'Fr':[87,1,[223.0],[1.0]],
   'Ra':[88,2,[226.0],[1.0]],
   'Ac':[89,3,[227.0],[1.0]],
   'Th':[90,4,[232.0380504],[1.0]], #iupac '97
   'Pa':[91,4,[231.03588],[1.0]],
                        'U':[92,6,[234.0409456,235.0439231,236.0455619,238.0507826],[0.000055,0.007200,0.0,0.992745]], #iupac '97
   'Np':[93,5,[237.0],[1.0]],
   'Pu':[94,3,[244.0],[1.0]],
   'Am':[95,2,[243.0],[1.0]],
   'Cm':[96,3,[247.0],[1.0]],
   'Bk':[97,3,[247.0],[1.0]],
   'Cf':[98,0,[251.0],[1.0]],
                        'Es':[99,0,[252,.0],[1.0]],
   'Fm':[100,0,[257.0],[1.0]],
   'Md':[101,0,[258.0],[1.0]],
   'No':[102,0,[259.0],[1.0]],
   'Lr':[103, 0,[262.0],[1.0]],
   'Rf':[104, 0,[261.0],[1.0]],
   'Db':[105, 0,[262.0],[1.0]],
   'Sg':[106, 0,[266.0],[1.0]]
}

#######################################
# Collect properties
#######################################
def getMass(x):
 atom=re.findall('[A-Z][a-z]*',x)
 number=re.findall('[0-9]+', x)
 if len(number) == 0:
  multiplier = 1
 else:
  multiplier = float(number[0])
 atomic_mass=float(matrix(PeriodicTable[atom[0]][2])*transpose(matrix(PeriodicTable[atom[0]][3])))
# That's right -- the molecular weight is based on the isotopes and ratios
 return (atomic_mass*multiplier)

def getCharge(x):
 atom=re.findall('[A-Z][a-z]*',x)
 number=re.findall('[0-9]+', x)
 if len(number) == 0:
  multiplier = 1
 else:
  multiplier = float(number[0])
 atomic_charge=float(PeriodicTable[atom[0]][1])
 return (atomic_charge*multiplier)


#####################################################
# Iterate over expanded formula to collect property
#####################################################
def molmass(formula):
 mass=0
 while (len(formula)>0):
  segments = re.findall('[A-Z][a-z]*[0-9]*',formula)
  for i in range(0, len(segments)):
   mass+=getMass(segments[i])
  formula=re.sub(formula, '', formula)
 return mass

def molcharge(formula):
 charge=0
 while (len(formula)>0):
  segments = re.findall('[A-Z][a-z]*[0-9]*',formula)
  for i in range(0, len(segments)):
   charge+=getCharge(segments[i])  
  formula=re.sub(formula, '', formula)
 return charge


################################################################################
#expands ((((M)N)O)P)Q to M*N*O*P*Q
################################################################################

def formulaExpander(formula):
 while len(re.findall('\(\w*\)',formula))>0:
  parenthetical=re.findall('\(\w*\)[0-9]+',formula)
  for i in parenthetical:
   p=re.findall('[0-9]+',str(re.findall('\)[0-9]+',i)))
   j=re.findall('[A-Z][a-z]*[0-9]*',i)
   oldj=j
   for n in range(0,len(j)):
    numero=re.findall('[0-9]+',j[n])
    if len(numero)!=0:
     for k in numero:
      nu=re.sub(k,str(int(int(k)*int(p[0]))),j[n])
    else:
     nu=re.sub(j[n],j[n]+p[0],j[n])
    j[n]=nu
   newphrase=""
   for m in j:
    newphrase+=str(m)
   formula=formula.replace(i,newphrase)
  if (len((re.findall('\(\w*\)[0-9]+',formula)))==0) and (len(re.findall('\(\w*\)',formula))!=0):
   formula=formula.replace('(','')
   formula=formula.replace(')','')
 return formula


#######
# main #
########
if __name__ == '__main__':
 molecules=molecules.split(',')
 for element in molecules:
  element=formulaExpander(element)
  print ('The mass of %(substance)s is %(Mass)f and the calculated charge is %(Charge)i.' % {'substance': \
   element, 'Mass': molmass(element), 'Charge': molcharge(element)})



02 October 2012

251. Isotopic pattern and molecular weight calculator in Python for Linux

UPDATE: I've moved this code to https://sourceforge.net/projects/pyisocalc/

I'm not answering questions about this code -- it's a work in progress (updated every other day) and if you can't figure out how to use it  on your own, you're not the (currently) intended audience. For example, I've only had time to add a small subsection of the elements.

I originally implemented a very different solution -- a very exact and shiny one. The problem is that the number of permutations increases too rapidly, so that anything larger than e.g. B3(NO3)4 would use up 8 GB of RAM or more. 'Easy' molecules like C18 didn't use that much RAM, but still introduced a noticeable delay. Trimming the list of permutations introduces errors (small, hopefully) but speeds things up orders of magnitude.

In other words: this calculator is moderately fast (python), and very accurate (as far as I can tell). As I keep on looking at more and more complex examples for validation I find that I need to introduce various trimming functions to keep the matrices small.

Having said that, it's still kind of neat. Here's RuCl5^2- by my program and Matt Monroe's calculator (which I trust):


Monroe's output:


And plotting on top (scaled Monroe's by 1.08 to compensate for the error in scaling in Monroe's program which gives 108% abundance):


I removed the figures of W6O19^- since the error in the y axis scale in Monroe's program (went to 120%) made it a less good example, and the list of peaks is too long for easy comparison.
Here's another figure:
A hypothetical W6^- molecule


Anyway, here are a couple of syntax examples:

  Usage:
 ./isocalc 'Al2(NO3)3'
 ./isocalc 'Al2(NO3)3' -1
 ./isocalc 'Al2(NO3)3' -1 output.dat
 ./isocalc Al2N3O9 
  ./isocalc Al(NO3)3(OH)1
  ./isocalc Al(NO3)3(OH)
./isocalc Al

See here for the source code:
https://sourceforge.net/projects/pyisocalc/

20 March 2012

114. Nwchem 6.0 with openmpi support on debian testing

I still haven't managed to compile a working versin of Nwchem 6.1 on Debian 64 bit regardless of whether I'm using mpich or openmpi. The number of posts relating to compiling nwchem is steadily growing, but I'd rather have post which are almost, but not quite, identical if it makes it's unambiguous for the average user how to build and use nwchem.

Anyway, since I'm using openmpi on my rocks cluster(s), I figure I might as well start using openmpi on debian too. In addition, the only way you can get nwchem 6.0 to work with mpich2 on debian seems to be by using the old v1.2 package which causes problems of its own (see apt-pinning).

Note: See here for information about python support: http://verahill.blogspot.com.au/2012/04/adding-python-support-to-nwchem-under.html

Long story short -- nwchem with openmpi:
mkdir ~/tmp
sudo apt-get install openmpi-bin libopenmpi-dev
wget http://www.nwchem-sw.org/images/Nwchem-6.0.tar.gz
tar -xvf Nwchem-6.0.tar.gz
cd nwchem-6.0/

export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=/home/me/tmp/nwchem-6.0
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

This will take a good 20-30 minutes.


Your binary will be in nwchem-6.0/bin/LINUX64/

Finally, see whether openmpi is already in your LD_LIBRARY_PATH

echo $LD_LIBRARY_PATH
/lib/openmm:/usr/lib/nvidia-cuda-toolkit:/usr/lib/nvidia
If not, edit ~/.bashrc and add
export LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD_LIBRARY_PATH
export PATH=$PATH:/home/me/tmp/nwchem-6.0/bin/LINUX64


113. Using ECCE to run nwchem jobs

EDIT: This post is getting messier as I'm hammering things out...but I've gotten everything to work in the end, so please persist.  The workflow described below is not the ideal one, but it'll get you started. I'll link here when I put up a newer, more reasonable tutorial.

EDIT2: I'm really warming to ECCE as I'm learning more about it. I still think it'd be nice if it was open source, and I can't understand why it has to be reliant on csh (which is pretty much broken on ROCKS, and uncomfortable at the best of times), but it's pretty neat once you've got all the details ironed out. Error feedback/report could be better though.

EDIT 3: ECCE is going open source the (northern) summer of 2012! As users we no longer have any excuses to complain.

Here's a quick introduction to getting started with using ECCE as the interface to nwchem, similar to how gaussview can be used to set up gaussian jobs.

This presumes that you've set up ECCE and preferably compiled your own version of nwchem:
http://verahill.blogspot.com.au/2012/03/ecce-on-debian-but-not-on-rockscentos.html
http://verahill.blogspot.com.au/2012/03/nwchem-61-with-openmpi-on-rocks.html
http://verahill.blogspot.com.au/2012/01/debian-testing-64-wheezy-nwhchem.html


##Important##
Once I had figured all of this out I rebuilt nwchem and re-installed ecce in the proper locations. You might want to do the same.

A. If you're going to use several nodes you should put nwchem in the same position in the file system hierarchy on all nodes e.g.
/opt/nwchem/nwchem-6.0/bin/LINUX64/nwchem

Also, make sure you share a folder (see how to use NFS) between the nodes which you can use for run time files e.g. /work

EDIT 4: This (probably) isn't necessary. In fact, using NFS in the wrong way will slow things down.

Set the permissions right (chown your user and set to 777 -- 755 is enough for nfs sharing between debian nodes, but between ROCKS and Debian you seem to need 777), and open your firewall on all ports for communication between the nodes.

B. Make sure that ECCE_HOME has been set in ~/.bashrc e.g.
export ECCE_HOME=/opt/ecce/apps

and in ~/.cshrc
setenv ECCE_HOME=/opt/ecce/apps

C.
edit /opt/ecce/apps/siteconfig/submit.site (location depends on where you install ecce)
Change lines 65+ from
#NWChemCommand {
#  $nwchem $infile > $outfile
#}
to (for multiple nodes)
NWChemCommand {
mpirun -hostfile /work/hosts.list -n $totalprocs --preload-binary /opt/nwchem/nwchem-6.0/bin/LINUX64/nwchem $infile > $outfile
}
to use mpirun for parallel job submissions and assuming you have a hosts file in /work. For running on a single node you can use


NWChemCommand {
mpirun  -n $totalprocs $nwchem  $infile > $outfile
}

user either --preload-binary /opt/nwchem/nwchem-6.0/bin/LINUX64/nwchem or $nwchem -- see what works for you. You probably can't do preload if you're running different linux distros (e.g. debian and centos)

My hosts.list looks like this:

tantalum slots=4 max_slots=4
beryllium slots=4 max_slots=5

Make sure that you don't accidentally put 2 jobs on node 0, then 2 jobs on node 1, then another 2 jobs on node 0, since they won't be consecutively numbered and will crash armci. You can avoid this by setting slots and max_slots to the same number.


D.
You may have to edit /etc/openmpi/openmpi-mca-params.conf if you have several (real or virtual) interfaces and add e.g.


btl=tcp,sm,self
btl_tcp_if_include=eth1,eth2
btl_tcp_if_exclude=eth0,virtbr0


Start ECCE:
First start the server
csh /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
then launch ecce

ecce

This will launch what the ecce people call the 'gateway':
The Gateway

0. Make sure you've got your machine set up
Click on Machine browser
Make sure that you can connect to the node e.g. by clicking on disk usage

Set the application paths. Don't fiddle with nodes -- just change number of processors to the total for all nodes.



1. Draw SiCl4 
Click on the Builder in the Gateway, which gives you the following:
The builder window

Click on More to get the periodic table which gives you access to Si

Select Geometry -- here, Tetrahedral

Si -- with four 'nubs' (yup, that's what the ecce ppl call them)

Time to attach Cl atoms to the nubs. Select Cl and pick Terminal geometry.

Click on a 'nub' to replace it with a Cl

And do it until you've replaced all 'nubs'. Hold down right mouse button to rotate

Click on the broom next to the bond menu on the right to pre-optimize  the structure using MM

And save. You will probably be limited to saving your jobs in folders below the ecce  folder.


2. Set up your job
Click on the Organizer icon in the 'gateway', which takes you here:

Click on the first icon, Editor

Focus on selecting Theory and Run type. Here's we'll do a geometry optimisation.

Click on Details for Theory

Click on Details for Run type

Constraints are optional

In the organizer, click on the third icon to set the basis set. Defined atoms for a particular basis set are indicated by a n orange right lower corner

You can get Details about the basis set

If you don't have a Navy Triangle you can't run. Click on Editor and see what might be wrong.

Ready to run. Click on Launch.
4. Running
I'm still working on enabling more than a single core...
Once you've clicked on launch you'll get

 If you click on viewer you can monitor the job

Optimization in progress
5. Re-launch a job at higher theory
In the Organizer, select your last job and then click on Edit, Duplicate Setup with Last Geometry
You then get a copy to edit

Change the basis set, save, then click on Final Edit

This is the nwchem input file in a vim instance

Add a line to the end, saying task scf freq to calculate the vibrations (there's another job option called geovib which does optim+freq , but here we do it by hand)

Launch

Running...

You can now look at the vibrations

And you can visualise MOs -- here's the HOMO which looks like all isolated p orbitals on the chlorine

You can also calculate 'properties'

These include GIAO shielding

Performance:
Here's phenol (scf/6-31g*) across three gigabit-linked nodes. The dotted line denotes node boundaries.


Here's a number of alkanes (scf/6-31g) on 4 cores on a single node:


19 March 2012

111. Ecce (nwchem) on Debian, and ROCKS/Centos

If you're using nwchem chances are that you've considered using ECCE to parse the output:
http://ecce.emsl.pnl.gov/

First of all you'll need to register at https://eus.emsl.pnl.gov/Portal/ -- and you can only do that if you're faculty. Postdocs and PhD students need not apply. Other than that, it's free, but you'll have to wait a couple of days to get your registration approved.

As much as I like nwchem owing to the clear syntax, I feel less warmly about ecce. Don't get me wrong -- it's pretty. It's just feels archaic and cobbled together. Even worse is that it's not open source and that its workings feel a bit opaque at times. Still, there's no better program for visually parsing nwchem output at this point. Anyway...

--start here --
Debian:
Download the install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh file to ~/tmp/ecce

There's no md5sum supplied but here's what I got:
2ee70cc817dee9f80b11be5eac6e53e5

If you haven't already
sudo apt-get install csh 

OK, moving on...
cd ~/tmp/ecce
chmod +x  install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
./install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh


Main ECCE installation menu
===========================
0) Help on main menu options
1) Full install
2) Full upgrade
3) Application software install
4) Application software upgrade
5) Server install
6) Server upgrade

Pick 1 if you're installing on your desktop and there's no server that you know of. 

Once the installation is over you get:
***************************************************************
!! You MUST perform the following steps in order to use ECCE !!
-- Unless only the user 'me' will be running ECCE,
   start the ECCE server as 'me' with:
     /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
-- To register machines to run computational codes, please see
   the installation and compute resource registration manuals
   at http://ecce.pnl.gov/using/installguide.shtml
-- To run ECCE each user must source either the runtime_setup
   (csh/tcsh) or runtime_setup.sh (sh/bash/ksh) script in the
   directory /home/me/tmp/ecce/ecce-v6.2/apps/scripts
   from their shell environment setup script.  For example,
   with csh or tcsh, add the following to ~/.cshrc:
     if (-e /home/me/tmp/ecce/ecce-v6.2/apps/scripts/runtime_setup) then
       source /home/me/tmp/ecce/ecce-v6.2/apps/scripts/runtime_setup
     endif
***************************************************************
Which translates to:
1. sh  /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
2. Sourcing that file makes no sense. Instead, add the following to your ~/.bashrc
export ECCE_HOME=/home/me/tmp/ecce/ecce-v6.2/apps
export PATH=${ECCE_HOME}/scripts:${PATH}

Assuming you've source your ~/.bashrc, start ecce by typing
ecce

...which takes an unreasonably long time (ca 1 min) after which you're greeted by
Press Any Key
Type in a password -- any password -- which will be your password from now on.
You're then taken to
Click on Viewer (assuming you've got something to look at)
Pay attention to the fine print
Have a look at the text box in the bottom right corner..and pay attention. In my particular case I have 6 cores and an mpi aware nwchem 6.0 version compiled. I bet that's better than whatever comes bundled with ecce. Also, the

To change you go to the machine browser (see screen shot #2), click on set up remote access and make sure that everything is working by clicking on e.g. processes:

Then click on the Machine menu (top left), select Register Machine while your machine is selected.
You can now change your options.

Running:
So, before using ecce you always need to
sh  /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
first. The server will run until you stop it or reboot.
Next, start ecce
ecce

Integration with nwchem
Most people would probably set up their nwchem jobs by hand, because it's so simple. All you need to do is to include the statement
ecce_print ecce.out
in the beginning, and you'll get an ecce.out file which you can then IMPORT (not open regularly, but import) into ecce.

Click on Viewer, Import Calculation From Output File, select your ecce out and voilá:
ECCE: homo (benzene)
If you're running debian, you're done now.



ROCKS 5.4.3/Centos 5.6:
This isn't a fix as much as a rant. The problem with ROCKS 5.4.3 is that csh is so broken that it's a struggle just to install ecce. I mean, I do show how to get ecce running in the end, but ROCKS feels like an unfinished piece of work compared to a normal debian install.

--Demonstration only -- don't do --
First back up ssh-key.sh and ssh-key.csh in /etc/profile.d

So...you start by
chmod +x install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
./install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
...and nothing's happening.

You then try just typing in
csh

/etc/profile.d/ssh-key.sh: line 211: return: can only `return' from a function or sourced script
It appears that you have not set up your ssh key.
This process will make the files:
     /export/home/me/.ssh/id_rsa.pub
     /export/home/me/.ssh/id_rsa
     /export/home/me/.ssh/authorized_keys
Generating public/private rsa key pair.
/export/home/me/.ssh/id_rsa already exists.
Overwrite (y/n)? 

Turns out there's a bug in ROCKS 5.4.3.  You can fix that by:
rpm -Uvh ftp://www.rocksclusters.org/pub/rocks/updates/5.4.3/x86_64/RPMS/rocks-config-server-5.4.3-1.x86_64.rpm

So far so good.
csh
...and nothing. It just exits. Or so you think. But the problem is bigger than that --  try opening a new terminal in e.g. gnome (gnome-terminal or xterm) -- it exits immediately. No error message or anything.

You can get csh to start by moving /etc/csh.cshrc out of the way, but you're still screwed as to opening a new terminal. The only way to get back a working system is to restore ssh-key.sh and ssh-key.csh.

--- Demonstration over ---

--Start here --
 You could also get around all this by running
csh -f
But then you don't have any env. variables loading and it can lead to problems of its own.

Anyway:
csh -f install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh

The install starts. Just follow the instructions.

After installation, start the server:
csh -f ecce-v6.2/server/ecce-utils/start_ecce_server

Hit enter until you get a workable prompt back...
Edit your ~/.bashrc and add

export ECCE_HOME=/home/me/tmp/ecce/ecce-v6.2/apps
export PATH=${ECCE_HOME}/scripts:${PATH}

Don't bother sourcing your ~/.bashrc. It's easier to just open a new terminal.
Type
ecce
and you should be up and running...sort of. Under ROCKS I had problems importing ecce.out files since I had problems actually connecting to the server. Don't know why, but it came down to not being able to open a remote shell on the host.

NOTE:
this worked fine on one box, but not on another one which I was setting up remotely. On that one I had to edit

ecce/apps/siteconfig/Dataservers
and
ecce/apps/siteconfig/jndi.properties 

In particular, I had to change references to eccetera.emsl.pnl.gov.