ensemble_md.replica_exchange_EE

The replica_exchange_EE module provides functions for setting up and performing replica exchange and expanded ensemble (REXEE) simulations.

class ensemble_md.replica_exchange_EE.ReplicaExchangeEE(yaml_file, analysis=False)[source]

A class that provides a variety of functions useful for setting up and running a replica exchange (REX) of expanded ensemble (EE) simulation, or a REXEE simulation. Upon instantiation, all parameters in the input YAML file will be assigned to an attribute in the class. In addition to these variables, below is a list of attributes of the class. (All the the attributes are assigned by set_params except that yaml is assigned by __init__.)

Variables
  • gmx_path – The absolute path of the GROMACS exectuable.

  • gmx_version – The version of the GROMACS executable.

  • yaml – The input YAML file used to instantiate the class. The file should contain necessary REXEE parameters. For more details, please check the 3. Input YAML parameters.

  • warnings – Warnings about parameter specification in either YAML or MDP files.

  • template – The template MDP file on which the instance of the MDP class is based.

  • reformatted_mdp – Whether the template MDP file has been reformatted by replacing hyphens with underscores or not.

  • dt – The simulation timestep in ps.

  • temp – The simulation temperature in Kelvin.

  • fixed_weights – Whether the weights will be fixed during the simulation.

  • updating_weights – The list of weights as a function of time (since the last update of the Wang-Landau incrementor) for different replicas. The length is equal to the number of replicas. This is only relevant for weight-updating simulations.

  • equilibrated_weights – The equilibrated weights of different replicas. For weight-updating simulations, this list is initialized as a list of empty lists. Otherwise (i.e., in fixed-weight simulations), it is initialized as a list of None.

  • current_wl_delta – The current value of the Wang-Landau incrementor. This is only relevent for weight-updating simulations.

  • kT – 1 kT in kJ/mol at the simulation temperature.

  • lambda_types – The types of lambda variables involved in expanded ensemble simulations, e.g., fep_lambdas, mass_lambdas, coul_lambdas, etc.

  • n_tot – The total number of states for all replicas.

  • n_sub – The numbmer of states of each replica. The current implementation assumes homogenous replicas.

  • state_ranges – A list of list of (global) state indices for each replica.

  • equil – A list of times it took to equilibrate the weights for different replicas. This list is initialized with a list of -1, where -1 means that the weights haven’t been equilibrated. Also, a value of 0 means that the simulation is a fixed-weight simulation.

  • n_rejected – The number of proposed exchanges that have been rejected. Updated by accept_or_reject.

  • n_swap_attempts – The number of swaps attempted so far. This does not include the cases where there is no swappable pair. Updated by get_swapping_pattern.

  • n_emtpy_swappable – The number of times when there was no swappable pair.

  • rep_trajs – The replica-space trajectories of all configurations.

  • configs – A list that thows the current configuration index that each replica is sampling.

  • g_vecs – The time series of processed (e.g., combined across replicas) alchemical weights for the entire state space. If no weight combination scheme is applied, this list will just be a list of None’s.

  • df_data_type – The type of data (either \(u_{nk}\) or \(dH/dλ\)) that will be used for free energy calculations. This depends on the free energy estimator specified in the parameter df_method.

  • modify_coords_fn – The function (callable) in an external module (specified as modify_coords in the input YAML file) for modifying coordinates at exchanges. This parameter is only relevant to multi-topology REXEE (i.e., MT-REXEE) simulations.

set_params(analysis)[source]

Sets up or reads in the user-defined parameters from an input YAML file and an MDP template. This function is called to instantiate the class in the __init__ function of class. Specifically, it does the following:

  1. Sets up constants.

  2. Reads in REXEE parameters from a YAML file.

  3. Handles YAML parameters.

  4. Checks if the parameters in the YAML file are well-defined.

  5. Reformats the input MDP file to replace all hyphens with underscores.

  6. Reads in parameters from the MDP template.

After instantiation, the class instance will have an attribute corresponding to each of the parameters specified in the YAML file. For a full list of the parameters that can be specified in the YAML file, please refer to 3. Input YAML parameters.

Parameters
  • yaml_file (str) – The file path of the input YAML file that specifies REXEE parameters.

  • analysis (bool, Optional) – Whether the instantiation of the class is for data analysis of REXEE simulations. The default is False.

Raises

ParameterError

  • If a required parameter is not specified in the input YAML file.

  • If a specified parameter is not recognizable.

  • If a specified option is not available for a parameter.

  • If the data type or range (e.g., positive or negative) of a parameter is not correct.

  • If any MDP parameter invalid for the REXEE simulation is detected.

check_gmx_executable()[source]

Checks if the GROMACS executable can be used and gets its absolute path and version.

print_params(params_analysis=False)[source]

Prints important parameters relevant to the REXEE simulation to be performed.

Parameters

params_analysis (bool, Optional) – Whether additional parameters for data analysis should be printed. The default is False.

static reformat_MDP(mdp_file)[source]

Reformats the input MDP file so that all hyphens in the parameter names are replaced by underscores. If the input MDP file contains hyphens in its parameter names, the class attribue self.reformatted will be set to True. In this case, the new MDP object with reformatted parameter names will be written to the original file path of the file, while the original file will be renamed with a _backup suffix. If the input MDP file is not reformatted, the function sets the attribute self.reformatted_mdp to False.

Parameters

mdp_file (str) – The file path of the MDP file to be reformatted.

Returns

reformatted – Whether the file was reformatted.

Return type

bool

initialize_MDP(idx)[source]

Initializes the MDP object for generating an MDP file for a specific replica based on the MDP template. This function should be called only for generating MDP files for the FIRST iteration and it has nothing to do with whether the weights are fixed or equilibrating. It is assumed that the MDP template has all the parameters shared by all replicas.

Parameters

idx (int) – Index of the simulation whose MDP parameters need to be initialized.

Returns

MDP – A gmx_parser.MDP object that can be used to write the MDP file.

Return type

gmx_parser.MDP obj

get_ref_dist(pullx_file=None)[source]

Gets the initial COM distance between the pull groups in the input GRO file. Importantly, this distance will serve as the reference distance starting from the second iteration. This function initializes the attribute ref_dist and is only relevant when a distance restraint is applied in the GROMACS pull code.

Parameters

pullx_file (str, Optional) – The path of the pullx file whose initial value will be used as the reference distance. Usually, this should be the path of the pullx file of the first iteration. The default is sim_0/iteration_0/pullx.xvg.

update_MDP(new_template, sim_idx, iter_idx, states, wl_delta, weights, counts=None)[source]

Updates the MDP file for a new iteration based on the new MDP template, which is the MDP file from the previous iteration. Note that if the weights got equilibrated in the previous iteration, the weights will be fixed at these equilibrated values for all the following iterations.

Parameters
  • new_template (str) – The new MDP template file, which typically is the MDP file of the previous iteration.

  • sim_idx (int) – The index of the simulation whose MDP parameters need to be updated.

  • iter_idx (int) – The index of the iteration to be performed later.

  • states (list) – A list of last sampled states of all simulaitons in the previous iteration.

  • wl_delta (list) – A list of fina Wang-Landau incrementors of all simulations.

  • weights (list) – A list of lists final weights of all simulations.

  • counts (list, Optional) – A list of lists final counts of all simulations. If the value is None, then the MDP parameter init-histogram-counts won’t be specified in the next iteration. Note that this parameter is only supported by GROMACS with versions later than 2022.3.

Returns

MDP – A gmx_parser.MDP object that can be used to write the MDP file.

Return type

gmx_parser.MDP obj

extract_final_dhdl_info(dhdl_files)[source]

Extracts the last sampled states for all replica simulations.

Parameters

dhdl_files (list) – A list of file paths to GROMACS DHDL files of different replicas. Note that the order of the files should be consistent with the order of the replicas.

Returns

states – A list of the global state indices of the last sampled states of all simulaitons.

Return type

list

extract_final_log_info(log_files)[source]

Extracts the following information for each replica simulation from a given list of GROMACS LOG files:

  • The final Wang-Landau incrementors.

  • The final lists of weights.

  • The final lists of counts.

Note that the order of the files should be consistent with the order of the replicas.

Parameters

log_files (list) – A list of file paths to GROMACS LOG files of different replicas.

Returns

  • wl_delta (list) – A list of final Wang-Landau incrementors of all simulations.

  • weights (list) – A list of lists of final weights of all simulations.

  • counts (list) – A list of lists of final counts of all simulations.

static identify_swappable_pairs(states, state_ranges, neighbor_exchange, add_swappables=None)[source]

Identifies swappable pairs. By definition, a pair of simulation is considered swappable only if their last sampled states are in the alchemical ranges of both simulations. This is required to ensure that the values of involved ΔH and Δg can always be looked up from the DHDL and LOG files. This also automatically guarantees that the simulations to be swapped have overlapping state sets.

Parameters
  • states (list) – A list of the global state indices of the last sampled states of all simulations. This list can be generated by extract_final_dhdl_info. Notably, the input list should not be a list that has been updated/modified by get_swapping_pattern, or the result will be incorrect.

  • state_ranges (list of lists) – A list of global state indices for all replicas. The input list can be a list updated by get_swapping_pattern, especially in the case where there is a need to re-identify the swappable pairs after an attempted swap is accepted.

  • neighbor_exchange (bool) – Whether to exchange only between neighboring replicas.

  • add_swappables (list) – A list of lists that additionally consider states (in global indices) that can be swapped. For example, add_swappables=[[4, 5], [14, 15]] means that if a replica samples state 4, it can be swapped with another replica that samples state 5 and vice versa. The same logic applies to states 14 and 15. This parameter is only relevant to MT-REXEE simulations.

Returns

swappables – A list of tuples representing the simulations that can be swapped.

Return type

list

Example

Below is an example where the REXEE simulation is composed of four replicas sampling states 0-3, 1-4, 2-5, and 3-6, respectively. At exchanges, these replicas are respectively at states 2, 3, 3, and 4. Therefore, the swappable pairs are [(0, 1), (1, 2), (2, 3)]. If only neighboring swaps are considered, the swappable pairs will be [(1, 2), (2, 3)].

>>> from ensemble_md.replica_exchange_EE import ReplicaExchangeEE as REXEE
>>> states = [2, 3, 2, 5]
>>> state_ranges = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]
>>> REXEE.identify_swappable_pairs(states, state_ranges, neighbor_exchange=False)
[(0, 1), (1, 2), (2, 3)]
>>> REXEE.identify_swappable_pairs(states, state_ranges, neighbor_exchange=True)
[(1, 2), (2, 3)]
static propose_swap(swappables)[source]

Proposes a swap of coordinates between replicas by drawing a pair from the list of swappable pairs.

Parameters

swappables (list) – A list of tuples representing the simulations that can be swapped.

Returns

swap – A tuple of simulation indices to be swapped. If there is no swappable pair, an empty list will be returned.

Return type

tuple or an empty list

get_swapping_pattern(dhdl_files, states)[source]

Generates a list (swap_pattern) that represents how the configurations should be swapped in the next iteration. The indices of the output list correspond to the simulation/replica indices, and the values represent the configuration indices in the corresponding simulation/replica. For example, if the swapping pattern is [0, 2, 1, 3], it means that in the next iteration, replicas 0, 1, 2, 3 should sample configurations 0, 2, 1, 3, respectively, where configurations 0, 1, 2, 3 here are defined as whatever configurations are in replicas 0, 1, 2, 3 in the CURRENT iteration (not iteration 0), respectively.

Notably, when this function is called (e.g., once every iteration in a REXEE simulation), the output list swap_pattern is always initialized as [0, 1, 2, 3, ...] and gets updated once every attempted swap. This is different from the attribute configs, which is only initialized at the very beginning of the entire REXEE simulation (iteration 0), though configs also gets updated with swap_pattern once every attempted swap in this function.

Parameters
  • dhdl_files (list) – A list of paths to the DHDL files. The indicies in the DHDL filenames should be in an ascending order, e.g. [dhdl_0.xvg, dhdl_1.xvg, ..., dhdl_N.xvg].

  • states (list) – A list of last sampled states (in global indices) of ALL simulations. states[i]=j means that the configuration in replica i is at state j at the time when the exchange is performed. This list can be generated extract_final_dhdl_info.

Returns

  • swap_pattern (list) – A list showing the configuration of replicas after swapping.

  • swap_list (list) – A list of tuples showing the accepted swaps.

calc_prob_acc(swap, dhdl_files, states, shifts)[source]

Calculates the acceptance ratio for swapping simulations.

Parameters
  • swap (tuple) – A tuple of indices corresponding to the simulations to be swapped.

  • dhdl_files (list) – A list of DHDL files, e.g. dhdl_files = ['dhdl_2.xvg', 'dhdl_1.xvg', 'dhdl_0.xvg', 'dhdl_3.xvg'] means that configurations 2, 1, 0, and 3 are now in replicas 0, 1, 2, 3. This can happen in multiple swaps when a previous swap between configurations 0 and 2 has just been accepted. Otherwise, the list of filenames should always be in the ascending order, e.g., ['dhdl_0.xvg', 'dhdl_1.xvg', 'dhdl_2.xvg', dhdl_3.xvg].

  • states (list) – A list of last sampled states (in global indices) in the DHDL files corresponding to configurations 0, 1, 2, … (e.g., dhdl_0.xvg, dhdl_1.xvg, dhdl_2.xvg, …) This list can be generated by extract_final_dhdl_info.

  • shifts (list) – A list of state shifts for converting global state indices to the local ones. Specifically, states substracted by shifts should be the local state indices of the last sampled states in dhdl_files[0], dhdl_files[1], … (which importantly, are not necessarily dhdl_0.xvg, dhdl_1.xvg, …). If multiple swaps are not used, then this should always be list(REXEE.s * np.arange(REXEE.n_sim)).

Returns

prob_acc – The acceptance ratio.

Return type

float

accept_or_reject(prob_acc)[source]

Returns a boolean variable indicating whether the proposed swap should be acceepted or not given the acceptance ratio.

Parameters

prob_acc (float) – The acceptance ratio.

Returns

swap_bool – A boolean variable indicating whether the swap should be accepted.

Return type

bool

get_averaged_weights(log_files)[source]

For each replica, calculates the averaged weights (and the associated error) from the time series of the weights since the previous update of the Wang-Landau incrementor. This is only relevant for weight-updating REXEE simulations.

Parameters

log_files (list) – A list of file paths to GROMACS LOG files of different replicas.

Returns

  • weights_avg (list) – A list of lists of weights averaged since the last update of the Wang-Landau incrementor. The length of the list should be the number of replicas.

  • weights_err (list) – A list of lists of errors corresponding to the values in weights_avg.

weight_correction(weights, counts)[source]

Adjusts the lambda weights based on the histogram counts by using the following equation: \(g_k' = g_k + ln(N_{k-1}/N_k)\), where \(g_k\) and \(g_k'\) are the lambda weight before and after the correction, respectively. Notably, in any of the following situations, we don’t do any correction.

  • Either \(N_{k-1}\) or \(N_k\) is \(0\).

  • Either \(N_{k-1}\) or \(N_k\) is smaller than the histogram cutoff specified by N_cutoff in the input YAML file.

Parameters
  • weights (list) – A list of lists of weights (of ALL simulations) to be corrected. The i-th element corresponds to the list of weights of the i-th replica.

  • counts (list) – A list of lists of counts (of ALL simulations).

Returns

weights – An updated list of lists of corected weights.

Return type

list

histogram_correction(hist, print_values=True)[source]

Adjusts the histogram counts. For example, if replicas A and B both sample states 1 and 2 and have histogram counts \(N^A_1\), \(N^A_2\), \(N^B_1\), and \(N^B_2\), the corrected histogram counts for states 1 and 2 for BOTH replicas will be adjusted according to the following equation: \(N_1'/N_2'=((N_1^A N_1^B)/(N_2^A N_2^B))^{1/2}\). Namely, the ratio of the corrected histogram counts for adjacent states is the geometric mean of the ratio of the original histogram counts for the same states. Note that if any histogram count is 0, histogram correction will not be performed and the original histogram counts will be returned.

Parameters
  • hist (list) – A list of lists of histogram counts of ALL simulations. The i-th element corresponds to the list of histogram counts of the i-th replica.

  • print_values (bool, Optional) – Whether to print the histogram counts for each replica before and after histogram correction. The default is True.

Returns

hist_modified – A list of lists of modified histogram counts of ALL simulations.

Return type

list

combine_weights(weights, weights_err=None, print_values=True)[source]

Combines alchemical weights across multiple replicas. Note that if weights_err is provided, inverse-variance weighting will be used. Care must be taken since inverse-variance weighting can lead to slower convergence if the provided errors are not accurate. (See 5.1. Weight combination for more details.)

Parameters
  • weights (list) – A list of lists of alchemical weights of ALL simulations. The i-th element corresponds to the list of weights of the i-th replica.

  • weights_err (list, Optional) – A list of lists of errors corresponding to the values in weights. The default is None.

  • print_values (bool, Optional) – Whether to print the weights for each replica before and after weight combination for each replica. The default is True.

Returns

  • weights_modified (list) – A list of modified alchemical weights of ALL simulations.

  • g_vec (numpy.ndarray) – An array of alchemical weights of the whole range of states.

run_REXEE(n, swap_pattern=None)[source]

Performs one iteration of a REXEE simulation, which includes generating the TPR files using the GROMACS grompp command and running the expanded ensemble simulations in parallel using the GROMACS mdrun command. The GROMACS commands are launched as subprocesses. The function assumes that the GROMACS executable is available.

Parameters
  • n (int) – The iteration index (starting from 0).

  • swap_pattern (list, Optional) – A list generated by get_swapping_pattern. It represents how the replicas should be swapped. This parameter is not needed only if n is 0. The default is None.