Files prepared and density convergence / ETotal calculation put into the queue on 54 nodes with 8 ppn and a walltime of 10 hours.
Calculation ran, however it failed because of high memory needs. A summary of the problems, taken from my e-mail to Josh:
I'm using exactly the same input file as before, but with angstrom added after the unit cell metrics followed on the next line by the angdeg token and the three angles. Unfortunately all (double cell 4, 2, and 3) return the same error message, one which I have never seen. I've attached an image of the error message, seen below:
Though the requested memory seemed significantly higher than normal, I tried to decrease the planewave cutoff from 35 hartrees to 25 hartrees, as that's been a successful strategy for decreasing memory usage in the past. Running the job interactively, I got the same error message but with decreased requested memory, as seen below:
I tried running the job with 4 processors per node in order to allow the processors to use the memory of the other nodes, however I got the same error with the numbers unchanged.
Josh reminded me that in order for the memory of the four unused nodes to be given to the used nodes, I needed to add #PBS -l pvmem=5GB to the submission file, as per http://www.nersc.gov/nusers/systems/carver/running_jobs/memory.php . With this change, the job successfully runs, even with an ecut of 35 hartrees. All compounds have been resubmitted with a wall time of 24 hours to account for half the processors being used.
Compounds 1-4 ran, however all ran out of memory during the first iteration, even after the estimated memory needed was 3.5 GB and 5 GB were available. After lowering ecut to 25 and running interactively, estimated memory needed was 2.5 GB. I'll try and resubmit all compounds with this lowered ecut. If memory still gets in the way then we can go to 2 ppn for 10 GB per processor.
All compounds ran and memory problems were dodged, however now the jobs are simply running out of time. The jobs were killed during the fifth iteration. Here are some ideas:
- Run the job in chunks, only calling four iterations each time.
Jobs are taking a long time to get through the que right now, and this would be extremely computationally expensive as well.
- Use a gaussian basis set.
This would require the adoption of new software, such as Crystal 09.
- Use ultrasoft pseudopotentials
This would also require the adoption of new software.
- Use fewer k-points.
Since we're currently parallelizing over k-points, each one would still have the same length calculation. This would, however, get us through the queue faster, as we wouldn't need as many processers.
- Use fewer k-points and parallelize over bands, too.
This will allow us to use more processers at once, however apparently using twice the processers will only calculate 1.5 times as fast, so this will be computationally inefficient / expensive. Also, I will need to take the time to learn how to parallelize over bands. There is a tutorial on the abinit website.
We've decided that I should learn band parallelization, and will be doing so and implementing it soon.
Alex has decided that we don't really care about the Etot from the double cell structures because we don't even really believe the ising model anymore. Therefore this project is done and I won't be learning band parallelization at this time.