ensemble_md.utils
ensemble_md.utils.gmx_parser
The gmx_parser
module provides functions for parsing GROMACS files.
- ensemble_md.utils.gmx_parser.parse_log(log_file)[source]
Parses a log file generated by a GROMACS expanded ensemble simulation and extracts important information. This function is especially useful for extracting information from each iteration in a REXEE simulation on the fly. There are three types of log files from an expanded ensemble simulation:
Case 1: The weights are still updating in the simulation and have never been equilibrated.
In this case, the output
equil_time
should be -1.
Case 2: The weights were equilibrated during the simulation.
The output
equil_time
is the time (in ps) it took to get the weights equilibrated.The final weights (
weights
) will just be the equilibrated weights.
Case 3: The weights were fixed in the simulation.
In this case, the output
equil_time
should be 0.The final weights (which never change during the simulation) and the final counts will still be returned.
- Parameters
log_file (str) – The file path of the input log file
- Returns
weights (list) – In all cases,
weights
should be a list of lists (of weights).In Case 1, a list of list of weights as a function of time since the last update of the Wang-Landau incrementor will be returned.
In Case 2, a list of list of weights as a function of time since the last update of the Wang-Landau incrementor up to equilibration will be returned.
In Case 3, the returned list will only have one list inside, which is the list of values at which the weights were fixed.
That is, for all cases,
weights[-1]
will be the final weights, which are important for seeding the next iteration in a REXEE simulation.counts (list) – The final histogram counts.
wl_delta (float) – The final Wang-Landau incementor. In Cases 2 and 3,
None
will be returned.equil_time (int or float) –
In Case 1, -1 will be returned, which means that the weights have not been equilibrated.
In Case 2, the equilibration time in ps will be returned.
In Case 3, 0 will be returned, which means that the weights were fixed during the simulation.
- class ensemble_md.utils.gmx_parser.MDP(input_mdp=None, **kwargs)[source]
A class that represents a GROMACS MDP file. Note that an MDP instance is an ordered dictionary, with the i-th key corresponding to the i-th line in the MDP file. Comments and blank lines are also preserved, e.g., with keys ‘C0001’ and ‘B0001’, respectively. The value corresponding to a ‘C’ key is the comment itself, while the value corresponding to a ‘B’ key is an empty string. Comments after a parameter on the same line are discarded. Leading and trailing spaces are always stripped.
- Parameters
input_mdp (str, Optional) – The path of the input MDP file. The default is None.
**kwargs (Optional) – Additional keyword arguments to be passed to add additional key-value pairs to the MDP instance. Note that no sanity checks will be performed for the key-value pairs passed in this way. This also does not work for keys that are not legal python variable names, such as anything that includes a minus ‘-’ sign or starts with a number.
- Variables
COMMENT (
re.Pattern
object) – A compiled regular expression pattern for comments in MDP files.PARAMETER (
re.Pattern
object) – A compiled regular expression pattern for parameters in MDP files.input_mdp (str) – The real path of the input MDP file returned by
os.path.realpath(input_mdp)
, which resolves any symbolic links in the path.
Example
>>> from ensemble_md.utils import gmx_parser >>> gmx_parser.MDP("em.mdp") MDP([('C0001', 'em.mdp - used as input into grompp to generate em.tpr'), ('C0002', 'All unspecified parameters adopt their own default values.'), ('B0001', ''), ('C0003', 'Run Control'), ('integrator', 'steep'), ('nsteps', 500000), ('B0002', ''), ('C0004', 'Energy minnimization'), ('emtol', 100.0), ('emstep', 0.01), ('B0003', ''), ('C0005', 'Neighbor searching/Electrostatics/Van der Waals'), ('cutoff-scheme', 'Verlet'), ('nstlist', 10), ('ns_type', 'grid'), ('pbc', 'xyz'), ('coulombtype', 'PME'), ('rcoulomb', 1.0), ('rvdw', 1.0)]) # noqa: E501
- COMMENT = re.compile('\\s*;\\s*(?P<value>.*)')
- PARAMETER = re.compile('\\s*(?P<parameter>[^=]+?)\\s*=\\s*(?P<value>[^;]*)(?P<comment>\\s*;.*)?', re.VERBOSE)
- write(output_mdp=None, skipempty=False)[source]
Writes the MDP instance (the ordered dictionary) to an output MDP file.
- Parameters
output_mdp (str, Optional) – The file path of the output MDP file. The default is the filename the MDP instance was built from. If that if
output_mdp
is not specified, the input MDP file will be overwritten.skipempty (bool, Optional) – Whether to skip empty values when writing the MDP file. If
True
, any parameter lines from the output that contain empty values will be removed. The default isFalse
.
- ensemble_md.utils.gmx_parser.compare_MDPs(mdp_list, print_diff=False)[source]
Identifies the parameters differeing between a given list of MDP files. Note that this function is not aware of the default values of GROMACS parameters. (Currently, this function is not used in the workflow adopted in
run_REXEE.py
but it might be useful in some places, so we decided to keep it.)- Parameters
mdp_list (list) – A list of MDP files.
print_diff (bool, Optional) – Whether to print the parameters that are different among the MDP files in a more readable format. The default is
False
.
- Returns
diff_params – A dictionary of parameters differing between MDP files. The keys are the parameter names and the values is a list of values of the parameters in the MDP files.
- Return type
Example
>>> from ensemble_md.utils import gmx_parser >>> mdp_list = ['A.mdp', 'B.mdp'] >>> diff_params = gmx_parser.compare_MDPs(mdp_list, print_diff=True) The following parameters are different among the MDP files: wl_scale - A.mdp: None - B.mdp: 0.8 ... >>> print(diff_params) {'wl_scale': [None, 0.8], ...}
ensemble_md.utils.utils
The utils
module provides useful utility functions for running or analyzing REXEE simulations.
- class ensemble_md.utils.utils.Logger(logfile)[source]
A logger class that redirects the STDOUT and STDERR to a specified output file while preserving the output on screen. This is useful for logging terminal output to a file for later analysis while still seeing the output in real-time during execution.
- Parameters
logfile (str) – The file path of which the standard output and standard error should be logged.
- Variables
terminal (
io.TextIOWrapper
object) – The original standard output object, typicallysys.stdout
.log (
io.TextIOWrapper
object) – File object used to log the output in append mode.
- ensemble_md.utils.utils.run_gmx_cmd(arguments, prompt_input=None)[source]
Runs a GROMACS command through a subprocess call.
- Parameters
arguments (list) – A list of arguments that compose of the GROMACS command to run, e.g.,
['gmx', 'mdrun', '-deffnm', 'sys']
.prompt_input (str or None, Optional) – The input to be passed to the interative prompt launched by the GROMACS command, if any.
- Returns
return_code (int) – The exit code of the GROMACS command. Any number other than 0 indicates an error.
stdout (str or None) – The STDOUT of the process.
stderr (str or None) – The STDERR or the process.
- ensemble_md.utils.utils.format_time(t)[source]
Converts time in seconds to a more readable format.
- Parameters
t (float) – The time in seconds.
- Returns
t_str – A string representing the time duration in a format of “X hour(s) Y minute(s) Z second(s)”, adjusting the units as necessary based on the input duration, e.g., 1 hour(s) 0 minute(s) 0 second(s) for 3600 seconds and 15 minute(s) 30 second(s) for 930 seconds.
- Return type
- ensemble_md.utils.utils.weighted_mean(vals, errs)[source]
Calculates the inverse-variance-weighted mean. Note that if any error is 0, the simple mean will be returned.
- Parameters
vals (list) – A list of values to be averaged.
errs (list) – A list of errors corresponding to the given values
- Returns
mean (float) – The inverse-variance-weighted mean.
err (float) – The propgated error of the mean.
- ensemble_md.utils.utils.calc_rmse(data, ref)[source]
Calculates the root mean square error (RMSE) of the given data with respect to the reference data.
- Parameters
data (list) – A list of values to be compared with the reference data.
ref (list) – A list of reference values.
- Returns
rmse – The root mean square error.
- Return type
- ensemble_md.utils.utils.get_time_metrics(log)[source]
Gets the time-based metrics from a log file of a REXEE simulation, including the core time, wall time, and performance (ns/day).
- Parameters
log (str) – The file path of the input log file.
- Returns
t_metrics – A dictionary having following keys:
t_core
,t_wall
,performance
.- Return type
- ensemble_md.utils.utils.analyze_REXEE_time(n_iter=None, log_files=None)[source]
Performs simple data analysis on the wall times and performances of all iterations of an REXEE simulation.
- Parameters
n_iter (None or int, Optional) – The number of iterations in the REXEE simulation. If None, the function will try to find the number of iterations by counting the number of directories named in the format of :code`iteration_*` in the simulation directory (specifically
sim_0
) in the current working directory or where the log files are located.log_files (None or list, Optional) – A list of lists of log paths with the shape of
(n_iter, n_replicas)
. If None, the function will try to find the log files by searching the current working directory.
- Returns
t_wall_tot (float) – The total wall time GROMACS spent to finish all iterations for the REXEE simulation.
t_sync (float) – The total time spent in synchronizing all replicas, which is the sum of the differences between the longest and the shortest time elapsed to finish a iteration.
t_wall_list (list) – The list of wall times for finishing each GROMACS mdrun command.