Thursday, August 6, 2015

Getting useful XYZ geometries from G09 and transforming them (to angstroms)

I finally got to the point when optimizing geometries of many, many molecules led me to needing more automation to create input files for the subsequent single-point calculations. Methinks this is as good of a time as any to learn python (and by learn I mean cobble together something functional, but ugly). Ok, with that truth out of the way, on to the goal of this exercise. I wanted to optimize geometries in G09, export the geometry (some of the geometry optimizations were in cartesian space and some used an input Z-matrix, so XYZ coordinates are the common output), and use the exported geometries to calculate single-point energies using MOLPRO.

1) First hurdle, getting a useful XYZ coordinates from G09 (also known as, why is there no easy way to grep a final set of cartesian coordinates from the log file?). There are lots of punch options and IOPs, but none that dump the geometry to the log file. This means using "punch=coord" and dealing with the fort.7 file that is generated. Here is the line from my bash script that submits all of the geometry optimizations.

$ for i in $( ls *pbe1pbe-vtz*.com ) ; do echo ${i}; NAME=$(basename $i ".com" ) ; /cluster/software/g09/g09 <./${NAME}.com >./${NAME}.log ; mv fort.7 ${NAME}.xyz ; done

2) Ok, that really was not so terrible, and I will stop complaining now. Second hurdle, transforming the fort.7 file into something useful because it writes the XYZ coordinates using the atomic number (instead of the associated element abbreviation) and the units are bohr. The basic parts of the python script that follow are reading and parsing the input file (this includes substituting "C" for "6", etc), finding the center of mass, shifting all of the atoms so the center of mass is at the origin, converting the distances to angstroms, and writing a new XYZ file. HUGE NOTE: Any improvements would be greatly appreciated.

#!/bin/python

bohr_per_ang = 1.88971616463207
ang_per_bohr = 0.5291772109217

from sys import argv

script, filename = argv

infile1 = open(filename,"r") #opens file with name of "test.txt"

geom1 = []

count = 0
for temp_line in infile1 :
        temp_line = temp_line.strip()
        line = temp_line.split()
        if int(line[0]) == 1 :
                atom = "H"
        elif int(line[0]) == 5 :
                atom = "B"
        elif int(line[0]) == 6 :
                atom = "C"
        parsed_line = [ atom, int(line[0]), float(line[1].replace("D","E")), float(line[2].replace("D","E")), float(line[3].replace("D","E")) ]
        count += 1
        geom1.append(parsed_line)

infile1.close()

#calculate the center of mass in x, y, and z directions
mx = 0
my = 0
mz = 0
mass_total = 0

for i in range(0,count) :
        mx = mx + geom1[i][1]*geom1[i][2]
        my = my + geom1[i][1]*geom1[i][3]
        mz = mz + geom1[i][1]*geom1[i][4]
        mass_total = mass_total + geom1[i][1]

com1 = [mx/mass_total, my/mass_total, mz/mass_total]

#shift all atoms so the center of mass is at 0,0,0
geom1_shifted=[[0 for j in range(0,3)] for i in range(0,count)]

for i in range(0, count) :
        geom1_shifted[i][0] = geom1[i][2] - com1[0]
        geom1_shifted[i][1] = geom1[i][3] - com1[1]
        geom1_shifted[i][2] = geom1[i][4] - com1[2]


outfile1 = open(filename, "w")
outfile1.write("%d\n" % (count) )
outfile1.write("\n")
for i in range(0, count) :
        outfile1.write( "%s %14.10f %14.10f %14.10f\n" % (geom1[i][0]  ,  geom1[i][2]*ang_per_bohr  ,  geom1[i][3]*ang_per_bohr  ,  geom1[i][4]*ang_per_bohr ))

outfile1.close()

And the bash line:

$ for i in $( ls c6h7-int*ub3lyp-6311ppg*.xyz ) ; do echo $i ; python bohr_to_ang.py ./${i} ; done

3) Now that we have that out of the way all we need to do is create a MOLPRO input file. The top part of my template is below, along with the bash line I use to create all of the input files for a given set of geometries (usually belonging to a particular method/basis set).

***,template
memory,800,M
gthresh,oneint=1.d-14,twoint=1.d-14,zero=1.d-14

angstrom
symmetry,nosym
geomtyp=xyz
geom={
}
    basis=6-31G*;
 {multi;canon,3100.2;
 occ,22;closed,21}

    basis=6-311G**;
 {multi;canon,3101.2;
 occ,22;closed,21}

    basis=aug-cc-pvtz;
 {multi;canon,3102.2;
 occ,22;closed,21}

basis={
default,vtz-f12
...

$ for i in $( ls geom-method-tests/c6h7-int*ub3lyp-6311ppg*.xyz ) ; do outname="$( basename $i "-a.xyz" )-uccsdtf12-vtzf12-ad.inp" ; echo $outname; cp c6h7-template-uccsdtf12-vtzf12-ad.inp $outname ; tail -n13 $i >tmpfile ; sed -i -e '/geom={/r tmpfile' $outname ; done

This last bash line uses the bulk of the coordinate filename and concatenates the single-point method and basis set onto the end when making the MOLPRO input file name. I then use sed to put the geometry into the newly minted template just after geom={. Voila!