ensemble_md.utils

ensemble_md.utils.gmx_parser

The gmx_parser module provides functions for parsing GROMACS files.

ensemble_md.utils.gmx_parser.parse_log(log_file)[source]

Parses a log file generated by a GROMACS expanded ensemble simulation and extracts important information. This function is especially useful for extracting information from each iteration in a REXEE simulation on the fly. There are three types of log files from an expanded ensemble simulation:

  • Case 1: The weights are still updating in the simulation and have never been equilibrated.

    • In this case, the output equil_time should be -1.

  • Case 2: The weights were equilibrated during the simulation.

    • The output equil_time is the time (in ps) it took to get the weights equilibrated.

    • The final weights (weights) will just be the equilibrated weights.

  • Case 3: The weights were fixed in the simulation.

    • In this case, the output equil_time should be 0.

    • The final weights (which never change during the simulation) and the final counts will still be returned.

Parameters

log_file (str) – The file path of the input log file

Returns

  • weights (list) – In all cases, weights should be a list of lists (of weights).

    • In Case 1, a list of list of weights as a function of time since the last update of the Wang-Landau incrementor will be returned.

    • In Case 2, a list of list of weights as a function of time since the last update of the Wang-Landau incrementor up to equilibration will be returned.

    • In Case 3, the returned list will only have one list inside, which is the list of values at which the weights were fixed.

    That is, for all cases, weights[-1] will be the final weights, which are important for seeding the next iteration in a REXEE simulation.

  • counts (list) – The final histogram counts.

  • wl_delta (float) – The final Wang-Landau incementor. In Cases 2 and 3, None will be returned.

  • equil_time (int or float) –

    • In Case 1, -1 will be returned, which means that the weights have not been equilibrated.

    • In Case 2, the equilibration time in ps will be returned.

    • In Case 3, 0 will be returned, which means that the weights were fixed during the simulation.

class ensemble_md.utils.gmx_parser.MDP(input_mdp=None, **kwargs)[source]

A class that represents a GROMACS MDP file. Note that an MDP instance is an ordered dictionary, with the i-th key corresponding to the i-th line in the MDP file. Comments and blank lines are also preserved, e.g., with keys ‘C0001’ and ‘B0001’, respectively. The value corresponding to a ‘C’ key is the comment itself, while the value corresponding to a ‘B’ key is an empty string. Comments after a parameter on the same line are discarded. Leading and trailing spaces are always stripped.

Parameters
  • input_mdp (str, Optional) – The path of the input MDP file. The default is None.

  • **kwargs (Optional) – Additional keyword arguments to be passed to add additional key-value pairs to the MDP instance. Note that no sanity checks will be performed for the key-value pairs passed in this way. This also does not work for keys that are not legal python variable names, such as anything that includes a minus ‘-’ sign or starts with a number.

Variables
  • COMMENT (re.Pattern object) – A compiled regular expression pattern for comments in MDP files.

  • PARAMETER (re.Pattern object) – A compiled regular expression pattern for parameters in MDP files.

  • input_mdp (str) – The real path of the input MDP file returned by os.path.realpath(input_mdp), which resolves any symbolic links in the path.

Example

>>> from ensemble_md.utils import gmx_parser
>>> gmx_parser.MDP("em.mdp")
MDP([('C0001', 'em.mdp - used as input into grompp to generate em.tpr'), ('C0002', 'All unspecified parameters adopt their own default values.'), ('B0001', ''), ('C0003', 'Run Control'), ('integrator', 'steep'), ('nsteps', 500000), ('B0002', ''), ('C0004', 'Energy minnimization'), ('emtol', 100.0), ('emstep', 0.01), ('B0003', ''), ('C0005', 'Neighbor searching/Electrostatics/Van der Waals'), ('cutoff-scheme', 'Verlet'), ('nstlist', 10), ('ns_type', 'grid'), ('pbc', 'xyz'), ('coulombtype', 'PME'), ('rcoulomb', 1.0), ('rvdw', 1.0)])  # noqa: E501
COMMENT = re.compile('\\s*;\\s*(?P<value>.*)')
PARAMETER = re.compile('\\s*(?P<parameter>[^=]+?)\\s*=\\s*(?P<value>[^;]*)(?P<comment>\\s*;.*)?', re.VERBOSE)
read()[source]

Reads and parses the input MDP file.

write(output_mdp=None, skipempty=False)[source]

Writes the MDP instance (the ordered dictionary) to an output MDP file.

Parameters
  • output_mdp (str, Optional) – The file path of the output MDP file. The default is the filename the MDP instance was built from. If that if output_mdp is not specified, the input MDP file will be overwritten.

  • skipempty (bool, Optional) – Whether to skip empty values when writing the MDP file. If True, any parameter lines from the output that contain empty values will be removed. The default is False.

ensemble_md.utils.gmx_parser.compare_MDPs(mdp_list, print_diff=False)[source]

Identifies the parameters differeing between a given list of MDP files. Note that this function is not aware of the default values of GROMACS parameters. (Currently, this function is not used in the workflow adopted in run_REXEE.py but it might be useful in some places, so we decided to keep it.)

Parameters
  • mdp_list (list) – A list of MDP files.

  • print_diff (bool, Optional) – Whether to print the parameters that are different among the MDP files in a more readable format. The default is False.

Returns

diff_params – A dictionary of parameters differing between MDP files. The keys are the parameter names and the values is a list of values of the parameters in the MDP files.

Return type

dict

Example

>>> from ensemble_md.utils import gmx_parser
>>> mdp_list = ['A.mdp', 'B.mdp']
>>> diff_params = gmx_parser.compare_MDPs(mdp_list, print_diff=True)
The following parameters are different among the MDP files:
wl_scale
- A.mdp: None
- B.mdp: 0.8
...
>>> print(diff_params)
{'wl_scale': [None, 0.8], ...}

ensemble_md.utils.utils

The utils module provides useful utility functions for running or analyzing REXEE simulations.

class ensemble_md.utils.utils.Logger(logfile)[source]

A logger class that redirects the STDOUT and STDERR to a specified output file while preserving the output on screen. This is useful for logging terminal output to a file for later analysis while still seeing the output in real-time during execution.

Parameters

logfile (str) – The file path of which the standard output and standard error should be logged.

Variables
  • terminal (io.TextIOWrapper object) – The original standard output object, typically sys.stdout.

  • log (io.TextIOWrapper object) – File object used to log the output in append mode.

write(message)[source]

Writes a message to the terminal and to the log file.

Parameters

message (str) – The message to be written to STDOUT and the log file.

flush()[source]

This method is needed for Python 3 compatibility. This handles the flush command by doing nothing. Some extra behaviors may be specified here.

ensemble_md.utils.utils.run_gmx_cmd(arguments, prompt_input=None)[source]

Runs a GROMACS command through a subprocess call.

Parameters
  • arguments (list) – A list of arguments that compose of the GROMACS command to run, e.g., ['gmx', 'mdrun', '-deffnm', 'sys'].

  • prompt_input (str or None, Optional) – The input to be passed to the interative prompt launched by the GROMACS command, if any.

Returns

  • return_code (int) – The exit code of the GROMACS command. Any number other than 0 indicates an error.

  • stdout (str or None) – The STDOUT of the process.

  • stderr (str or None) – The STDERR or the process.

ensemble_md.utils.utils.format_time(t)[source]

Converts time in seconds to a more readable format.

Parameters

t (float) – The time in seconds.

Returns

t_str – A string representing the time duration in a format of “X hour(s) Y minute(s) Z second(s)”, adjusting the units as necessary based on the input duration, e.g., 1 hour(s) 0 minute(s) 0 second(s) for 3600 seconds and 15 minute(s) 30 second(s) for 930 seconds.

Return type

str

ensemble_md.utils.utils.weighted_mean(vals, errs)[source]

Calculates the inverse-variance-weighted mean. Note that if any error is 0, the simple mean will be returned.

Parameters
  • vals (list) – A list of values to be averaged.

  • errs (list) – A list of errors corresponding to the given values

Returns

  • mean (float) – The inverse-variance-weighted mean.

  • err (float) – The propgated error of the mean.

ensemble_md.utils.utils.calc_rmse(data, ref)[source]

Calculates the root mean square error (RMSE) of the given data with respect to the reference data.

Parameters
  • data (list) – A list of values to be compared with the reference data.

  • ref (list) – A list of reference values.

Returns

rmse – The root mean square error.

Return type

float

ensemble_md.utils.utils.get_time_metrics(log)[source]

Gets the time-based metrics from a log file of a REXEE simulation, including the core time, wall time, and performance (ns/day).

Parameters

log (str) – The file path of the input log file.

Returns

t_metrics – A dictionary having following keys: t_core, t_wall, performance.

Return type

dict

ensemble_md.utils.utils.analyze_REXEE_time(n_iter=None, log_files=None)[source]

Performs simple data analysis on the wall times and performances of all iterations of an REXEE simulation.

Parameters
  • n_iter (None or int, Optional) – The number of iterations in the REXEE simulation. If None, the function will try to find the number of iterations by counting the number of directories named in the format of :code`iteration_*` in the simulation directory (specifically sim_0) in the current working directory or where the log files are located.

  • log_files (None or list, Optional) – A list of lists of log paths with the shape of (n_iter, n_replicas). If None, the function will try to find the log files by searching the current working directory.

Returns

  • t_wall_tot (float) – The total wall time GROMACS spent to finish all iterations for the REXEE simulation.

  • t_sync (float) – The total time spent in synchronizing all replicas, which is the sum of the differences between the longest and the shortest time elapsed to finish a iteration.

  • t_wall_list (list) – The list of wall times for finishing each GROMACS mdrun command.

ensemble_md.utils.exceptions

exception ensemble_md.utils.exceptions.ParameterError[source]

Error raised when detecting improperly specified parameters in the YAML file.

exception ensemble_md.utils.exceptions.ParseError[source]

Error raised during parsing a file.