Anyway, I found a fairly detailed presentation in which they were using Gaussian 98 here: https://www.uow.edu.au/~adamt/Trevitt_Research/Links_files/pKa%20workshop%20slides.pdf

While that should be ok to reproduce, it's not that straightforward to do even with Gaussian, since G09 and G03 don't report solvation parameters in the same way (or detail) as G98.

I'm also a lot keener on NWChem than Gaussian for various reasons, not least that it's 'free' (both libre and gratis) while Gaussian inc. has been accused of doing somewhat unfriendly things in the name of protecting their business interests.

See here and here for an example, and then here for a rebuttal from Gaussian inc. I know that EMSL/Pacific Northwest National Lab that develop NWChem and ECCE are prohibited from using Gaussian since they are considered as being competitors.

**Back to science.**

Our test example will be acetate, and we'll use formic acid to correct our results.

The fact that this post is very long is due to the amount of detail supplied -- I prefer to show some of the more obvious things so that people can learn from what I post -- and I learn by writing the post.

But first let's just do everything using direct methods.

We work with a thermochemical cycle:

IF we can't calculate the DG_solution directly (i.e. too expensive) we can optimise our structures in the gas phase, and then calculate the solvation energy for those structures.

Then DG_sol=DG_gas+DG_solvation(B)-DGsolvation(A).

(more generally sum of DG_solv(prod) - sum of DG_solv(reactants)).

**1. pKa of Acetic acid using direct methods**

We can either do

H3CCOOH -> H3COO- + H+

or

H3CCOOH + H2O -> H3COO- + H3O+

Steps:

Optimise acetic acid and acetate in the gas phase and do frequency calculations to get the enthalpy and entropy. Then use the gas phase structures and do single point calculations using COSMO to get the electrostatic solvation energies. Finally, use standard state corrections.

**A. Optimise acetic acid in the gas phase and do frequency calculation**

Title "aceticacid_gas" Start aceticacid_gas echo charge 0 geometry autosym units angstrom C 0.0402340 0.0308110 0.0402340 H -0.600803 -0.611482 0.679201 H 0.679201 -0.611482 -0.600803 H -0.607055 0.659335 -0.607055 C 0.903928 0.890481 0.903928 O 0.814831 2.23989 0.814831 O 1.78275 0.299052 1.78275 H 2.25438 1.03276 2.25438 end basis "ao basis" spherical print H library "6-31++G**" O library "6-31++G**" C library "6-31++G**" END dft mult 1 direct noio XC b3lyp grid fine mulliken end driver default end task dft optimize task dft freq

which gives

```
Total DFT energy = -229.102415550663
One electron energy = -550.833155571059
Coulomb energy = 230.555870626149
Exchange-Corr. energy = -29.497734561879
Nuclear repulsion energy = 120.672603956126
Numeric. integr. density = 31.999999816803
Total iterative time = 3.6s
```

and
Temperature = 298.15K frequency scaling parameter = 1.0000 Zero-Point correction to Energy = 38.683 kcal/mol ( 0.061646 au) Thermal correction to Energy = 41.568 kcal/mol ( 0.066243 au) Thermal correction to Enthalpy = 42.160 kcal/mol ( 0.067186 au) Total Entropy = 69.198 cal/mol-K - Translational = 38.179 cal/mol-K (mol. weight = 60.0211) - Rotational = 23.855 cal/mol-K (symmetry # = 1) - Vibrational = 7.164 cal/mol-K Cv (constant volume heat capacity) = 14.327 cal/mol-K - Translational = 2.979 cal/mol-K - Rotational = 2.979 cal/mol-K - Vibrational = 8.368 cal/mol-K

So that G=-229.102415550663*(627.503 kcal/Hartree)+(42.160 kcal/mol-298.15*(69.198 cal/molK)/1000)=-1.4372e+05 kcal/mol

B. Optimise acetate in the gas phase and do frequency calculation

Title "acetate_gas" Start acetate_gas echo charge -1 geometry autosym units angstrom C 0.0405721 0.0285481 0.0405721 H -0.601438 -0.613690 0.678620 H 0.678620 -0.613690 -0.601438 H -0.605857 0.658809 -0.605857 C 0.904975 0.886806 0.904975 O 0.825186 2.23364 0.825186 O 1.77103 0.316179 1.77103 end basis "ao basis" spherical print H library "6-31++G**" O library "6-31++G**" C library "6-31++G**" END dft mult 1 direct noio XC b3lyp grid fine mulliken end driver default end task dft optimize task dft freq

which gives

```
Total DFT energy = -228.540046314754
One electron energy = -539.474204198295
Coulomb energy = 229.063209931484
Exchange-Corr. energy = -29.339040486703
Nuclear repulsion energy = 111.209988438759
Numeric. integr. density = 32.000000595562
Total iterative time = 2.9s
```

and

Temperature = 298.15K frequency scaling parameter = 1.0000 Zero-Point correction to Energy = 30.023 kcal/mol ( 0.047845 au) Thermal correction to Energy = 32.271 kcal/mol ( 0.051427 au) Thermal correction to Enthalpy = 32.863 kcal/mol ( 0.052371 au) Total Entropy = 64.022 cal/mol-K - Translational = 38.129 cal/mol-K (mol. weight = 59.0133) - Rotational = 23.766 cal/mol-K (symmetry # = 1) - Vibrational = 2.127 cal/mol-K Cv (constant volume heat capacity) = 11.112 cal/mol-K - Translational = 2.979 cal/mol-K - Rotational = 2.979 cal/mol-K - Vibrational = 5.153 cal/mol-Kso that G=-228.540046314754*(627.503 kcal/Hartree)+(32.271 kcal/mol-298.15*(64.022 cal/molK)/1000)=-1.4338e+05 kcal/mol

**Putting A and B together: (-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))=344.54 kcal/mol**

We haven't accounted for solvation or the proton yet.

**C. Solvation of acetic acid**

Title "aceticacid_solvation" Start aceticacid_solvation echo charge 0 geometry autosym units angstrom C -0.313400 -1.37257 0.00000 H -0.932151 -1.56367 -0.882188 H -0.932151 -1.56367 0.882188 H 0.551887 -2.03461 0.00000 C 0.149260 0.0607660 0.00000 O 1.30165 0.439500 0.00000 O -0.897630 0.927778 0.00000 H -0.523912 1.82535 0.00000 end basis "ao basis" spherical print H library "6-31++G**" O library "6-31++G**" C library "6-31++G**" END dft mult 1 direct noio XC b3lyp grid fine mulliken end cosmo end task dft energy

which gives

```
COSMO solvation results
-----------------------
gas phase energy = -229.1024156124
sol phase energy = -229.1172649438
(electrostatic) solvation energy = 0.0148493315 ( 9.32 kcal/mol)
```

**D. Solvation of acetate**

Title "acetate_solvation" Start acetate_solvation echo charge -1 geometry autosym units angstrom C -0.0308736 -1.36399 0.00000 H 0.503418 -1.74042 0.882388 H 0.503418 -1.74042 -0.882388 H -1.05531 -1.75261 0.00000 C -0.00485953 0.199667 0.00000 O -1.12642 0.778065 0.00000 O 1.14855 0.713544 0.00000 end basis "ao basis" spherical print H library "6-31++G**" O library "6-31++G**" C library "6-31++G**" END dft mult 1 direct noio XC b3lyp grid fine mulliken end cosmo end task dft energy

which gives

```
COSMO solvation results
-----------------------
gas phase energy = -228.5400463128
sol phase energy = -228.6567490452
(electrostatic) solvation energy = 0.1167027324 ( 73.23 kcal/mol)
```

**Putting A, B and solvation energies together:**

**[**

**(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23-9.32]=344.54-63.910=280.63 kcal/mol**

**E. The Proton**

If you try to do any calculations on an isolated proton you get and SCFE of zero, and you won't do much better in terms of thermochemical data. Yet, monoatomic gases obviously still posses entropy and enthalpy. Instead, the document I cite above uses the ideal gas partition function for ideal monoatomic gases which gives a value of 6.28 kcal/mol for the free energy of a proton. The last reference states that the free energy for a proton in the gas phase is experimentally determined to be

**-6.28 kcal/mol**and that the free energy of hydration is

**-264.61**

**kcal/mol**(experimental).

**[**

**(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))-6.28-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))]-[73.23+264.61-9.32]=338.26-328.52=9.74 kcal/mol =40.752 kJ/mol**

We're not done yet though -- make sure to continue to 'F. Standard States'

**F. Standard States**

We know that

DG=DG^0+RT ln (Q).

We have a bit of a problem. We're doing calculations in the gas phase (pressure) but looking at predicting solution values (concentration). Also, I don't fully get it yet, so my explanation is probably a bit fuzzy.

So if A+B-> C+D we get Q=(C*D)/(A*B) but for A -> C+D we get Q=(C*D)/(A) or (1 bar x1 bar/1 bar)= 1 bar = 101350 Pa => nRT/P=1*8.314*298.15/101350=0.024458 m3= 24.46 L. The concentration of each species is 1 mol/bar which in volume terms means 1/24.46 L.

And the A -> C + D situation is what we have if we look at

HA -> H+ + A-

So,

Q=((1/24.46)*(1/24.46)/(1/24.46))=1/2.4.46 which gives

DG=DG^0-RTln(24.46)=DG^0-7924.9 J/mol

Thus, we need to correct for the standard state: 40.752 -7924.9/1000=32.827 kJ/mol

**G. Calculating pKa**

Since (in solution so that concentrations are 1 M)

DG=DG^0+RTln(K), where K=([H3COO][H+])/([H3COOH])

DG=DG^0+(RT ln([HCOO]/[H3COOH])+RTln (10^(-pH)), where RT ln([HCOO]/[H3COOH])=RTln(1/1)=0 so

DG=DG^0+RTln (10^(-pH))=DG^0+RT log (10^(-pH)/log(e)=DG^0-(RT/log(e)) * pH

Which for equilbrium, where pH=pKa and DG=0, turns into

pKa=DG^0*log(e)/RT

**pKa**= DG*log(e)/(RT)=(32.827*1000)*log(e)/(8.314*298.15)=

**5.75**

Not that great as predictions go (exp: pKa=4.75). Looking at some of the literature one error lies in the size of the solvation energies. Possibly one should tune the parameters used in the COSMO.

**2. Isodemic reaction/correction**

**Using formic acid**

This approach is based on 1) the similarity between two compounds and 2) us knowing the DG_solution parameter for one of them.

Assuming that we know that formic acid has a pKa of 3.75, then DG_solution=pKa*RT/log(e)=3.75*8.314*298.15/log10(e)/1000=21.404 kJ/mol. The reverse reaction is -21.404 kJ/mol.

We skip a few steps.

Here are the calculated parameters for formic acid (using the same method as above):

**Formic acid**

SCFE: -189.772804709496 Hartree

Enthalpy correction: 23.773 kcal/mol

Entropy correction: 59.339 cal/mol

Solvation energy: 9.99 kcal/mol

**Formate**

SCFE: -189.217943605798 Hartree

Enthalpy correction: 15.016 kcal/mol

Entropy correction: 56.992 cal/mol

Solvation energy: 72.47 kcal/mol

*[Just for kicks we quickly look at what the prediction is:*

*accounting for everything (solvation, proton etc.)*

*((-189.217943605798*627.503+(15.016-298.15*56.992/1000)-72.47) +(-6.28-264.61))-(-189.772804709496*627.503+(23.773-298.15*59.339/1000)-9.99)=6.7498 kcal/mol*

*6.7498*4.184-7924.9/1000=20.316 kJ/mol <=>*

**pKa**=1000*20.316*log10(e)/(8.314*298.15)=**3.56]****The isodesmic approach:**

Here we look at

H3CCOOH + -OOCH -> H3CCOO- +HOOCH

This combined reaction has a

DG_solution=DG_solution(acetic acid/acetate)+DG_solution(formate/formic acid)

<=>

DG_solution(acetic acid/acetate)= DG_solution-DG_solution(formic acid/formate)

=DG_solution-(-21.404 kJ/mol)

=DG_gas+[DG_sol(acetate)+DG_sol(formic acid)-DG_sol(acetic acid)-DG_sol(formate)]+21.404 kJ/mol

=

(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-189.772804709496*627.503+(23.773-298.15*(59.339/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-189.217943605798*627.503+(15.016-298.15*(56.992/1000)))+(-73.23-9.99+9.32+72.47)+5.116 kcal/mol= 8.11 kcal/mol= 33.93 kJ/mol

Here we don't need to fiddle with standard states or experimental values for solvation of the proton.

**pKa**= DG*log(e)/(RT)=(33.93*1000)*log(e)/(8.314*298.15)=

**5.94**

which is even worse than before...

We want ca 27 kJ/mol = 6.48 kcal/mol. Paradoxically this may be due to the ab initio approach to the pKa of formic acid actually giving a very reasonable value.

**Using Propanoic acid**

So let's try the isodesmic approach using propanoic acid as our reference instead.

**Propanoic acid**

SCFE: -268.419515785389 Hartree

Enthalpy correction: 60.894 kcal/mol

Entropy correction: 75.184 cal/mol

Solvation energy: 8.15 kcal/mol

**Propanoate**

SCFE: -267.857478200414 Hartree

Enthalpy correction: 52.096 kcal/mol

Entropy correction: 75.392 cal/mol

Solvation energy: 72.14 kcal/mol

pKa=4.86 <=> 27.739 kJ/mol= 6.63 kcal/mol

(-228.540046314754*(627.503)+(32.271-298.15*(64.022/1000)))+(-268.419515785389*627.503+(60.894-298.15*(75.184/1000)))-(-229.102415550663*(627.503)+(42.160-298.15*(69.198/1000)))-(-267.857478200414*627.503+(52.096-298.15*(75.392/1000)))+(-73.23-8.15+9.32+72.14)+6.63 kcal/mol=7.4324 kcal/mol=31.097

**pKa**= DG*log(e)/(RT)=(31.097*1000)*log(e)/(8.314*298.15)=5.45

It's a bit better, but still a bit off.

**3. Conclusion:**

The isodesmic approach is not magic and it relies on the similarity of two compounds, one for which there are experimental data, causing similar computational issues. Under the right conditions it's a useful approach, whereas under other conditions -- where a body of experimental data exists -- it might just be easier to determine the correlation between experimental and calculated data via fitting.

The approach worked better for the acetate/propanoate pair than the formate/acetate pair -- and one would consider acetic acid and propanoic acid to be more similar than formic acid and the higher acids. We're still far off from obtaining a perfect result though.

An additional problem is obviously the sensitivity of pKa to the DG -- one pH unit is about 1.36 kcal/mol, which is very small given the usual errors in DFT level calculations. I've seen indications online (google!) that the accuracy of b3lyp is about 3 kcal/mol, and one can always debate the accuracy of a highly empirical method like COSMO.

First of all, thank you for sharing this information. I have taken a look on your workflow and I would like to ask you some questions, all of them related to the solvation calculations: (1) Why do you not directly consider the “sol phase energy” as solvation energy?; (2) Why are you only considering the electrostatic component? What about the terms of cavitation and dispersion? (3) For the protonated and deprotonated forms, the electrostatic component is positive. However, in order to calculate DDG(solvation) you have used the DG(A-) with a changed sign. Why? Thank you in advance and looking forward to hearing from you!

ReplyDeleteThank you for your comment. It's been a long time since I posted this, and back then I knew less than I do now (or rather, back then I didn't know how little I knew), so I might well have made a few mistakes. However, let's see if I can address your questions.

DeleteQuestion 3 is the most straightforward one to answer:

we're working with a thermodynamic cycle (see figure above). We start with the solution state for A, remove it from solution (the cost of which is that we lose the solvation energy i.e. -DG(A-)), transform A to B, then solvate B (so we gain +DG(B)).

DG is a state function, so it doesn't matter which path we take to get DGsolution -- we can either get DGsolution directly (which used to be prohibitively expensive computationally), or indirectly by desolvating A (-DG(A)), doing the reaction in the gas phase (DG(A->B)), and solvation the product (+DG(B)).

Answer to Question 1: the 'sol phase energy' is the gas phase energy plus (well, minus) the solvation energy.

DeleteDGsolv=Egas-Esolv=-228.5400463128-(-228.6567490452)=.1167027324

Using DG vs E is bit unfortunate, but I stuck to the notation that the literature used.

Note that the structure is optimised in the gas phase, so we are just looking at the lowering in energy that we get from solvating the molecule.

Questions 2 is the most difficult one. I actually don't know whether the cavitation energy should be included or not. Looking at http://www.nwchem-sw.org/index.php/Release62:COSMO_Solvation_Model it seems that it should be:

Delete"It should be noted that one must in general take into account the standard state correction besides the electrostatic and cavitation/dispersion contribution to the solvation free energy, when a comparison to experimental data is made"

If you're using Gaussian, the following might be of interest:

Delete"SMD

Do an IEFPCM calculation with radii and non-electrostatic terms for Truhlar and coworkers’ SMD solvation model [Marenich09]. This is the recommended choice for computing ΔG of solvation, which accomplished by performing gas phase and SCRF=SMD calculations for the system of interest and taking the difference the resulting energies. "

See http://www.gaussian.com/g_tech/g_ur/k_scrf.htm

Thank you, your post help me to calculate enthalpies and Gibbs energies at different temperatures. However, for considering the effect of solvatation I used SMD which calculates the free energy of solvatation at 298K. How can I calculate it at others temperatures?

ReplyDeleteCongratulation for your blog, I frequently read it.

This answer is very late reply. You can easily set the temperature by adding keyword temp in freq block. For example,

Deletefreq

temp 1 298.15 ## default.

end

freq

temp 1 398.15

end

freq

temp 2 298.15 398.15

end