25 November 2011

20. Is there no molecular weight calculator in the Debian repos?

UPDATE: see here for an isotopic calculator written in python -- it calculates mass as well: http://verahill.blogspot.com.au/2012/10/isotopic-pattern-caculator-in-python.html

The best molecular weight calculator which I've encountered is Matthew Monroe's Molecular Weight Calculator, which can be found at http://ncrr.pnnl.gov/software/

It does everything. Molecular weights. Isotopic patterns. And so, so, so much more.

It has one major drawback though - it's written for Windows. Luckily, it sort of works under Wine after you've done a bit of wine-trick-ery.

As great as the calculator is, sometimes you only need to calculate the molecular weight of something, and nothing else. Searching the debian repos I can' t find a single dedicated molecular weight calculator. In particular, a command-line driven calculator would be nice.

Seriously - it's a crying shame that the distribution with the largest repos, i.e. Debian, does not have a single passable molecular weight calculator. It is even more surprising given the number of chemistry-related packages which are present.

So, here's what I did:
a quick google on "python molecular weight calculator" brought me to http://pygments.org

A little bit of editing gave the code below, which was saved as molcalc, copied to /usr/bin, followed by sudo chmod +x /usr/bin/molcalc. It can now be called using
molcalc "(Co(CO)5)2"
and returns
The mass of (Co(CO)5)2 is 257.916890.

Here's the code, which is 99.9% the original and 0.1% my modification. All credit thus due to Lee, Freitas and Tucker.

NOTE: it doesn't handle layered parentheses. (Al(NO3)3)2 gets interpreted as Al2(NO3)3.

#!/usr/bin/python2.6
#########################################################################
# Author: Toni Lee with the help of Guilherme Freitas and Becky Tucker. Minor changes by Lindqvist
# Copyright: This module has been placed in the public domain
#########################################################################

#Import regular expressions
import re
import sys
try:
test=sys.argv[1]
except:
quit()

#Create the dictionary (From Becky with a value of 0 inserted for Uus(mass not measurable))
TableofElements ={ 'H':1.00794,'He':4.002602,'Li':6.941,'Be':9.012182,
                        'B':10.811,'C':12.0107,'N':14.0067,'O':15.9994,'F':18.9984032,'Ne':20.1797,
                        'Na':22.98976928,'Mg':24.3050,'Al':26.9815386,'Si':28.0855,
                        'P':30.973762,'S':32.065,'Cl':35.453,'Ar':39.948,'K':39.0983,'Ca':40.078,
                        'Sc':44.955912,'Ti':47.867,'V':50.9415,'Cr':51.9961,'Mn':54.938045,
                        'Fe':55.845,'Ni':58.6934,'Co':58.933195,'Cu':63.546,'Zn':65.38,'Ga':69.723,
                        'Ge':72.64,'As':74.92160,'Se':78.96,'Br':79.904,'Kr':83.798,'Rb':85.4678,
                        'Sr':87.62,'Y':88.90585,'Zr':91.224,'Nb':92.90638,'Mo':95.96,'Tc':98,
                        'Ru':101.07,'Rh':102.90550,'Pd':106.42,'Ag':107.8682,'Cd':112.411,
                        'In':114.818,'Sn':118.710,'Sb':121.760,'Te':127.60,'I':126.90447,
                        'Xe':131.293,'Cs':132.9054519,'Ba':137.327,'La':138.90547,'Ce':140.116,
                        'Pr':140.90765,'Nd':144.242,'Pm':145,'Sm':150.36,'Eu':151.964,'Gd':157.25,
                        'Tb':158.92535,'Dy':162.500,'Ho':164.93032,'Er':167.259,'Tm':168.93421,
                        'Yb':173.054,'Lu':174.9668,'Hf':178.49,'Ta':180.94788,'W':183.84,
                        'Re':186.207,'Os':190.23,'Ir':192.217,'Pt':195.084,'Au':196.966569,
                        'Hg':200.59,'Tl':204.3833,'Pb':207.2,'Bi':208.98040,'Po':210,'At':210,
                        'Rn':220,'Fr':223,'Ra':226,'Ac':227,'Th':232.03806,'Pa':231.03588,
                        'U':238.02891,'Np':237,'Pu':244,'Am':243,'Cm':247,'Bk':247,'Cf':251,
                        'Es':252,'Fm':257,'Md':258,'No':259,'Lr':262,'Rf':261,'Db':262,'Sg':266,
                        'Bh':264,'Hs':277,'Mt':268,'Ds':271,'Rg':272, 'Uus':0
}


#######################################
#Computes the MW of an atom-number pair
#######################################
def getMass(x):
    atom=re.findall('[A-Z][a-z]*',x)
    number=re.findall('[0-9]+', x)
    if len(number) == 0:
        multiplier = 1
    else:
        multiplier = float(number[0])
    atomic_mass=TableofElements[atom[0]]
    return (atomic_mass*multiplier)

################################################################
#Segments formula into atom-number sections (i.e. 'H3' or 'N10')
################################################################
def parseFormula(fragment):
    segments=re.findall('[A-Z][a-z]*[0-9]*',fragment)
    return (segments)

##################################################################################
#Computes total mass of both parenthetical and nonparenthetical formula components
##################################################################################
def molmass(formula):
    parenMass=0
    nonparenMass=0
    while (len(formula)>0):
        #First computes the molecular weight of all parenthetical formulas from left to right
        while (len(re.findall('\(\w*\)[0-9]+', formula))!=0):
            parenthetical=re.findall('\(\w*\)[0-9]+',formula)
            for i in range(0,len(parenthetical)):
                parenMult1 = re.findall('\)[0-9]+', parenthetical[i])
                parenMult2 = re.findall('[0-9]+', parenMult1[0])
                segments =parseFormula(parenthetical[i])
                for i in range(0, len(segments)):
                    parenMass= parenMass + ((getMass(segments[i]))*(float(parenMult2[0])))
            formula=re.sub('\(\w*\)[0-9]+', '', formula)
        #Sums nonparenthetical molecular weights when all parenthetical molecular weights have been summed
        segments = parseFormula(formula)
        for i in range(0, len(segments)):
            nonparenMass=nonparenMass + getMass(segments[i])
        formula=re.sub(formula, '', formula)

    Mass=parenMass+nonparenMass
    return Mass
     
if __name__ == '__main__':
test=test.split(',')
for element in test:
print ('The mass of %(substance)s is %(Mass)f.' % {'substance': \
element, 'Mass': molmass(element)})
 

8 comments:

  1. I wrote qmol, derived from Tomislav Gountchev's KMol. At the moment, it does not do command line calculations, but it's on my TODO list.
    It does evaluate expressions like (Al(NO3)3)2 correctly (thanks to Tomislav's flexible formula parser).

    There's a package for debian 6.0 at:
    http://download.opensuse.org/repositories/home:/lineinthesand/Debian_6.0/
    Source code is located here:
    http://sourceforge.net/projects/qmol/files/

    ReplyDelete
    Replies
    1. I wrote an isotopic pattern calculator last year and rewrote the parser so that it can handle (almost) anything you throw at it: http://verahill.blogspot.com.au/2012/10/isotopic-pattern-caculator-in-python.html

      qmol: I couldn't install the .deb file (amd64) on wheezy (there's no libqt4 package -- maybe replaced by libqt4-core?), but compiling was straightforward. It looks like a nice piece of software. I'll make a post about it next week.

      Delete
    2. Thanks for the feedback. I had that package made by the open build service and I assumed the package would be ok if the build on their virtual debian machines was successful. So you say the package's name is libqt4-core? I'll try to research what the problem is about (I'm using an rpm-based distribution myself, so I'll have to rely on any feedback).

      Delete
    3. Correction: that should probably be libqt4core, without the hyphen (there are both libqt4-core and libqt4core, but the former is a dummy package which pulls in the latter, and more). I'll build in a chroot next week which should tell me for sure -- but for now:

      ldd `which qmol`
      linux-vdso.so.1 => (0x00007fff849ff000)
      libQtGui.so.4 => /usr/lib/x86_64-linux-gnu/libQtGui.so.4 (0x00002b480df44000)
      libQtCore.so.4 => /usr/lib/x86_64-linux-gnu/libQtCore.so.4 (0x00002b480ebfc000)
      [..]

      apt-file find /usr/lib/x86_64-linux-gnu/libQtGui.so.4
      libqtgui4: /usr/lib/x86_64-linux-gnu/libQtGui.so.4

      apt-file find /usr/lib/x86_64-linux-gnu/libQtCore.so.4
      libqtcore4: /usr/lib/x86_64-linux-gnu/libQtCore.so.4

      i A libqtcore4 - Qt 4 core module
      i A libqtgui4 - Qt 4 GUI module

      Delete
  2. I added command line calculation to qmol 0.3.2.

    Unfortunately, I don't know enough about Debian, but Wheezy seems to be 7.0 (which is testing). The package at the open build service is for Squeeze, which is version 6.0 (stable). Maybe obs will offer to build packages for 7.0 when it becomes stable.

    Your isotopic pattern calculator looks very interesting. Actually I had in mind incorporating one into qmol (possibly on a separate tab). I'll have a look at your code/the algorithm for doing this. Let's see if I'm able to do it (will surely take quite some time). (o;

    ReplyDelete
    Replies
    1. Thomas, I put a quick write-up here:
      http://verahill.blogspot.com.au/2013/01/318-qmol-032-molecular-weight.html

      Let me know if I got anything wrong.

      As for isotopic pattern calculator, the science is easy (get the cartesian product), and since you already have a formula parser half the work is done. The trick is to avoid the product matrix from ballooning out of control by continuously sorting, removing duplicates and trimming the isotopic data as your generating it. I did that in python by using the dictionary data type -- not sure about c++.

      Delete
    2. Thanks for that write-up! I'd just write qmol in lower case, but it's not so important. Could you possibly mention that KMol (which you were trying to compile; btw. I still have the latest KMol version installed on my system with its kde3 dependencies) was written by Tomislav Gountchev (http://gountchev.net/ although he took the source code off his page)?

      I will look into that 0 coefficient handling.

      As for the upper/lower case handling: yes, that's because of disabiguities that might occur (you may define an all-lowercase group, but as you stated, no mixed-case). Tomislav wrote that up in the KMol handbook, which I based the qmol handbook (Help > qmol Handbook) on.

      Did you notice you can configure what is output in command-line mode? You can read about that in the handbook, or, for a short sum-up just hover the text field for the 'Command line output format' in the GUI preferences dialog.

      @isotopic pattern calculator: as soon as I have the time, I'll analyze your source. Thanks for the short explanation (I still have a paper lying around which deals with isotopic patterns, C. Hsu 1984, but I haven't yet had the time to read through it). I guess with C++/Qt there's a type QMap for sorted entries (I'm already quite curious if there'll be speed/memory issues).

      Delete
    3. I've updated the qmol post. Thanks for the feedback.

      To make it easier for the day when you want to look at the isotope calculator I've uploaded the python source here: http://sourceforge.net/projects/pyisocalc/?source=navbar

      I figure it's better than copying from a webpage which will screw up the formatting.

      Delete