Code standard¶
General considerations¶
The PySCF code base is designed to provide a convenient environment for the development of new computational methods, ranging from proof-of-concept implementations to calculations on moderate size systems. Our emphasis is first on simplicity, next on generality, and last on efficiency. We favor implementations that have clear structure, with optimization only at Python level. If Python performance becomes a major bottleneck, parts of the implementation can be written in C to improve efficiency. The following guidelines (not strict rules!) have been followed in the development of PySCF. Please refer to them when suggesting new contributions.
90/10 functional/OOP, unless performance critical, functions are pure.
90/10 Python/C, only computational hot spots were written in C.
To extend python function with C:
Except complex numbers and variable length array, following C89 (gnu89) standard for C code. http://flash-gordon.me.uk/ansi.c.txt
Following C89 (gnu89) standard for C code;
Using ctypes to call C functions
Conservative on advanced language feature.
Minimal dependence principle
Minimal requirements on 3rd party program or libraries.
Loose-coupling between modules so that the failure of one module can have minimal effects on other modules.
Third party Python library imports need either back-up implementations or error/exception handling to avoid breaking the import chain
Guidelines for use of external C and Fortran libraries within C extensions to PySCF. The extensions are compiled and linked into PySCF, with compile/link flags resolved by CMake. - BLAS, FFTW: Yes. - LAPACK: Yes, but not recommended. LAPACK can be used in the PySCF C-level library. However, we recommend restructuring your code by moving all linear algebra and sparse matrix operations to NumPy operations in pure Python. - MPI and other parallel libraries: No. MPI communications should be implemented in Python through the MPI4py library.
Code format. Code should comply with the [PEP8](https://www.python.org/dev/peps/pep-0008/) style.
Naming conventions¶
The prefix or suffix underscore in the function names have special meanings
functions with prefix-underscore like
_fn
are private functions. They are typically not documented, and not recommended to use.functions with suffix-underscore like
fn_
means that they have side effects. The side effects include the change of the input arguments, the runtime modification of the class definitions (attributes or members), or module definitions (global variables or functions) etc.regular (pure) functions do not have underscore as the prefix or suffix.
API conventions¶
gto.Mole
(orgto.Cell
for PBC calculations) holds all global parameters, like the log level, the max memory usage etc. They are used as the default values for all other classes.Class for quantum chemistry models or algorithms
Most QC method classes (like HF, CASSCF, FCI, …) have three attributes
verbose
,stdout
andmax_memory
which are copied directly fromgto.Mole
(orgto.Cell
. Overwriting these attributes only affects the behavior of the local instance for that method class. In the following example,mf.verbose
mutes all messages produced byRHF
method, and the output ofMP2
is written in the log fileexample.log
:>>> from pyscf import gto, scf, mp >>> mol = gto.M(atom='H 0 0 0; H 0 0 1', verbose=5) >>> mf = scf.RHF(mol) >>> mf.verbose = 0 >>> mf.kernel() >>> mp2 = mp.MP2(mf) >>> mp2.stdout = open('example.log', 'w') >>> mp2.kernel()
Method class are only to hold the options or environments (like convergence threshold, max iterations, …) to control the behavior/convergence of the method. Intermediate status at runtime are not supposed to be saved in the method class (in contrast to the object oriented paradigm). However, the final results or outputs can be kept in the method object so that they can be easily accessed in the subsequent steps. We need to assume the attributes for results will be used as default inputs or environments for other objects in the rest parts of the program. The results attributes should be immutable, once they were generated and stored (after calling the
kernel()
method) in a particular object.In __init__ function, initialize/define the problem size. The problem size parameters (like num_orbitals etc) can be considered as environments. They should be immutable.
Kernel functions: Classes for QC models should provide a method
kernel()
as the entrance/main function. Thekernel()
function then call other code to finish the calculation. Although not required, it is recommended to let the kernel function return certain key results. If your class is inherited from thepyscf.lib.StreamObject
, the class has a methodrun()
which will call thekernel()
function and return the object itself. One can simply call thekernel()
method orrun()
method to start the flow of a QC method.
Function arguments
The first argument is a handler. The handler is one of
gto.Mole
object, a mean-field object, or a post-Hartree-Fock object.
Return value. Create returns for all functions whenever possible. For methods defined in class, return self instead of None if the method does not have particular return values.
Unit Tests and Example Scripts¶
Examples for modules should be placed in the appropriate directory inside the /examples directory. While the examples should be light enough to run on a modest personal computer, the examples should not be trivial. Instead, the point of the examples is to showcase the functionality of the module. The format for naming examples is:
/examples/name_of_module/XX-function_name.py
where XX is a two-digit numeric string.
Test cases are placed in the /test/name_of_module directory and performed with nosetest (https://nose.readthedocs.io/en/latest/). These tests are to ensure the robustness of both simple functions and more complex drivers between version changes.
General designs¶
Kernel and Stream functions¶
Every class has the kernel
method which serves as the entry or the driver of
the method. Once an object of one method was created, you can always call
.kernel()
to start or restart a calculation.
The return value of kernel method is different for different class. To unify the return value, the package introduces the stream methods to pipe the computing stream. A stream method of an object only return the object itself. There are three general stream methods available for most method classes. They are:
1 .set
method to update object attributes, for example:
mf = scf.RHF(mol).set(conv_tol=1e-5)
is identical to two lines of statements:
mf = scf.RHF(mol)
mf.conv_tol = 1e-5
2 .run
method to pass the call to the .kernel
method. If arguments are
presented in .run
method, the arguments will be passed to the kernel
function. If keyword arguments are given, .run
method will first
call .set
method to update the attributes then execute the .kernel
method. For example:
mf = scf.RHF(mol).run(dm_init, conv_tol=1e-5)
is identical to three lines of statements:
mf = scf.RHF(mol)
mf.conv_tol = 1e-5
mf.kernel(dm_init)
3 .apply
method to pass the current object (as the first argument) to the
given function/class and return a new object. If arguments and keyword
arguments are presented, they will all be passed to the function/class. For
example:
mc = mol.apply(scf.RHF).run().apply(mcscf.CASSCF, 6, 4, frozen=4)
is identical to:
mf = scf.RHF(mol)
mf.kernel()
mc = mcscf.CASSCF(mf, 6, 4, frozen=4)
Aside from the three general stream methods, the regular class methods may return the objects as well when the methods do not have particular value to return. Using the stream methods, you can evaluate certain quantities with one line of code:
dm = gto.M(atom='H 0 0 0; H 0 0 1') \
.apply(scf.RHF) \
.dump_flags() \
.run() \
.make_rdm1()
Pure function and Class¶
Class are designed to hold only the final results and the control parameters
such as maximum number of iterations, convergence threshold, etc.
Intermediates are NOT saved in the class. After calling the .kernel()
or
.run()
method, results will be generated and saved in the object. For
example:
from pyscf import gto, scf, ccsd
mol = gto.M(atom='H 0 0 0; H 0 0 1.1', basis='ccpvtz')
mf = scf.RHF(mol).run()
mycc = ccsd.CCSD(mf).run()
print(mycc.e_tot)
print(mycc.e_corr)
print(mycc.t1.shape)
print(mycc.t2.shape)
Many useful functions are defined at both the module level and class level. They can be accessed from either the module functions or the class methods and the return values should be the same:
vj, vk = scf.hf.get_jk(mol, dm)
vj, vk = SCF(mol).get_jk(mol, dm)
Note some module functions may require the class as the first argument.
Most functions and classes are pure, i.e. no intermediate status are held within the classes, and the argument of the methods and functions are immutable during calculations. These functions can be called arbitrary times in arbitrary order and their returns should be always the same.
Exceptions are often suffixed with underscore in the function name, e.g.
mcscf.state_average_(mc)
where the attributes of mc
object may be
changed or overwritten by the state_average_
method. Cautious should be
taken when you see the functions or methods with ugly suffices.
Global configurations¶
Global configuration file is a Python script that contains PySCF configurations.
When importing pyscf
module in a Python program (or Python interpreter), the
package will preload the global configuration file and take the configurations
as the default values of the parameters of functions or attributes of classes
during initialization. For example, the configuration file below detects the
available memory in the operate system at the runtime and set the maximum memory
for PySCF:
$ cat ~/.pyscf_conf.py
import psutil
MAX_MEMORY = int(psutil.virtual_memory().available / 1e6)
By setting MAX_MEMORY
in the global configuration file, you don’t need the
statement to set the max_memory
attribute in every script. The dynamically
determined max_memory
will be loaded during the program initialization step
automatically.
There are two methods to let the PySCF package load the global configurations.
One is to create a configuration file .pyscf_conf.py
in home directory or
in work directory. Another is to set the environment variable
PYSCF_CONFIG_FILE
which points to the configuration file (with the absolute
path). The environment variable PYSCF_CONFIG_FILE
has high priority than
the configuration file in default locations (home directory or work directory).
If environment variable PYSCF_CONFIG_FILE
is available, the program will
read the configurations from the $PYSCF_CONFIG_FILE
. If
PYSCF_CONFIG_FILE
is not set or the file it points to does not exist, the
program will turn to the default location for the file .pyscf_conf.py
. If
none of the configuration file exists, the program will use the built-in
configurations which are generally conservative settings.
In the source code, global configurations are loaded by importing
pyscf.__config__
module:
from pyscf import __config__
MAX_MEMORY = getattr(__config__, 'MAX_MEMORY')
Please refer to the source code for the available configurations.
Scanner¶
Scanner is a function that takes an Mole
(or Cell
) object as input and
return the energy or nuclear gradients of the given Mole
(or Cell
)
object. Scanner can be considered as a shortcut function for a sequence of
statements which includes the initialization of a required calculation model
with necessary precomputing, next updating the attributes based on the settings
of the referred object, then calling kernel function and finally returning
results. For example:
cc_scanner = gto.M().apply(scf.RHF).apply(cc.CCSD).as_scanner()
for r in (1.0, 1.1, 1.2):
print(cc_scanner(gto.M(atom='H 0 0 0; H 0 0 %g'%r)))
An equivalent but slightly complicated code is:
for r in (1.0, 1.1, 1.2):
mol = gto.M(atom='H 0 0 0; H 0 0 %g'%r)
mf = scf.RHF(mol).run()
mycc = cc.CCSD(mf).run()
print(mycc.e_tot)
There are two types of scanner available in the package. They are energy scanner and nuclear gradients scanner. The example above is the energy scanner. Energy scanner only returns the energy of the given molecular structure while the nuclear gradients scanner returns the nuclear gradients in addition.
Scanner is a special derived object of the caller. Most methods which are defined in the caller class can be used with the scanner object. For example:
mf_scanner = gto.M().apply(scf.RHF).as_scanner()
mf_scanner(gto.M(atom='H 0 0 0; H 0 0 1.2'))
mf_scanner.analyze()
dm1 = mf_scanner.make_rdm1()
mf_grad_scanner = mf_scanner.nuc_grad_method().as_scanner()
mf_grad_scanner(gto.M(atom='H 0 0 0; H 0 0 1.2'))
As shown in the example above, the scanner works pretty close to the relevant
class object except that the scanner does not need the kernel
or run
methods to run a calculation. Given molecule structure, the scanner
automatically checks and updates the necessary object dependence and passes the
work flow to the kernel
method. The computational results are held in the
scanner object as the regular class object does.
To make structure of scanner object uniform for all methods, two attributes
(.e_tot
and .converged
) are defined for all energy scanner
and three attributes (.e_tot
, .de
and .converged
) are defined for
all nuclear gradients scanner.