ensemble_md.replica_exchange_EE
The replica_exchange_EE
module provides functions for setting up and
performing replica exchange and expanded ensemble (REXEE) simulations.
- class ensemble_md.replica_exchange_EE.ReplicaExchangeEE(yaml_file, analysis=False)[source]
A class that provides a variety of functions useful for setting up and running a replica exchange (REX) of expanded ensemble (EE) simulation, or a REXEE simulation. Upon instantiation, all parameters in the input YAML file will be assigned to an attribute in the class. In addition to these variables, below is a list of attributes of the class. (All the the attributes are assigned by
set_params
except thatyaml
is assigned by__init__
.)- Variables
gmx_path – The absolute path of the GROMACS exectuable.
gmx_version – The version of the GROMACS executable.
yaml – The input YAML file used to instantiate the class. The file should contain necessary REXEE parameters. For more details, please check the 3. Input YAML parameters.
warnings – Warnings about parameter specification in either YAML or MDP files.
template – The template MDP file on which the instance of the
MDP
class is based.reformatted_mdp – Whether the template MDP file has been reformatted by replacing hyphens with underscores or not.
dt – The simulation timestep in ps.
temp – The simulation temperature in Kelvin.
fixed_weights – Whether the weights will be fixed during the simulation.
updating_weights – The list of weights as a function of time (since the last update of the Wang-Landau incrementor) for different replicas. The length is equal to the number of replicas. This is only relevant for weight-updating simulations.
equilibrated_weights – The equilibrated weights of different replicas. For weight-updating simulations, this list is initialized as a list of empty lists. Otherwise (i.e., in fixed-weight simulations), it is initialized as a list of
None
.current_wl_delta – The current value of the Wang-Landau incrementor. This is only relevent for weight-updating simulations.
kT – 1 kT in kJ/mol at the simulation temperature.
lambda_types – The types of lambda variables involved in expanded ensemble simulations, e.g.,
fep_lambdas
,mass_lambdas
,coul_lambdas
, etc.n_tot – The total number of states for all replicas.
n_sub – The numbmer of states of each replica. The current implementation assumes homogenous replicas.
state_ranges – A list of list of (global) state indices for each replica.
equil – A list of times it took to equilibrate the weights for different replicas. This list is initialized with a list of -1, where -1 means that the weights haven’t been equilibrated. Also, a value of 0 means that the simulation is a fixed-weight simulation.
n_rejected – The number of proposed exchanges that have been rejected. Updated by
accept_or_reject
.n_swap_attempts – The number of swaps attempted so far. This does not include the cases where there is no swappable pair. Updated by
get_swapping_pattern
.n_emtpy_swappable – The number of times when there was no swappable pair.
rep_trajs – The replica-space trajectories of all configurations.
configs – A list that thows the current configuration index that each replica is sampling.
g_vecs – The time series of processed (e.g., combined across replicas) alchemical weights for the entire state space. If no weight combination scheme is applied, this list will just be a list of
None
’s.df_data_type – The type of data (either \(u_{nk}\) or \(dH/dλ\)) that will be used for free energy calculations. This depends on the free energy estimator specified in the parameter
df_method
.modify_coords_fn – The function (callable) in an external module (specified as
modify_coords
in the input YAML file) for modifying coordinates at exchanges. This parameter is only relevant to multi-topology REXEE (i.e., MT-REXEE) simulations.
- set_params(analysis)[source]
Sets up or reads in the user-defined parameters from an input YAML file and an MDP template. This function is called to instantiate the class in the
__init__
function of class. Specifically, it does the following:Sets up constants.
Reads in REXEE parameters from a YAML file.
Handles YAML parameters.
Checks if the parameters in the YAML file are well-defined.
Reformats the input MDP file to replace all hyphens with underscores.
Reads in parameters from the MDP template.
After instantiation, the class instance will have an attribute corresponding to each of the parameters specified in the YAML file. For a full list of the parameters that can be specified in the YAML file, please refer to 3. Input YAML parameters.
- Parameters
- Raises
If a required parameter is not specified in the input YAML file.
If a specified parameter is not recognizable.
If a specified option is not available for a parameter.
If the data type or range (e.g., positive or negative) of a parameter is not correct.
If any MDP parameter invalid for the REXEE simulation is detected.
- check_gmx_executable()[source]
Checks if the GROMACS executable can be used and gets its absolute path and version.
- print_params(params_analysis=False)[source]
Prints important parameters relevant to the REXEE simulation to be performed.
- Parameters
params_analysis (bool, Optional) – Whether additional parameters for data analysis should be printed. The default is
False
.
- static reformat_MDP(mdp_file)[source]
Reformats the input MDP file so that all hyphens in the parameter names are replaced by underscores. If the input MDP file contains hyphens in its parameter names, the class attribue
self.reformatted
will be set toTrue
. In this case, the new MDP object with reformatted parameter names will be written to the original file path of the file, while the original file will be renamed with a_backup
suffix. If the input MDP file is not reformatted, the function sets the attributeself.reformatted_mdp
toFalse
.- Parameters
mdp_file (str) – The file path of the MDP file to be reformatted.
- Returns
reformatted – Whether the file was reformatted.
- Return type
- initialize_MDP(idx)[source]
Initializes the MDP object for generating an MDP file for a specific replica based on the MDP template. This function should be called only for generating MDP files for the FIRST iteration and it has nothing to do with whether the weights are fixed or equilibrating. It is assumed that the MDP template has all the parameters shared by all replicas.
- Parameters
idx (int) – Index of the simulation whose MDP parameters need to be initialized.
- Returns
MDP – A
gmx_parser.MDP
object that can be used to write the MDP file.- Return type
gmx_parser.MDP
obj
- get_ref_dist(pullx_file=None)[source]
Gets the initial COM distance between the pull groups in the input GRO file. Importantly, this distance will serve as the reference distance starting from the second iteration. This function initializes the attribute
ref_dist
and is only relevant when a distance restraint is applied in the GROMACS pull code.- Parameters
pullx_file (str, Optional) – The path of the pullx file whose initial value will be used as the reference distance. Usually, this should be the path of the pullx file of the first iteration. The default is
sim_0/iteration_0/pullx.xvg
.
- update_MDP(new_template, sim_idx, iter_idx, states, wl_delta, weights, counts=None)[source]
Updates the MDP file for a new iteration based on the new MDP template, which is the MDP file from the previous iteration. Note that if the weights got equilibrated in the previous iteration, the weights will be fixed at these equilibrated values for all the following iterations.
- Parameters
new_template (str) – The new MDP template file, which typically is the MDP file of the previous iteration.
sim_idx (int) – The index of the simulation whose MDP parameters need to be updated.
iter_idx (int) – The index of the iteration to be performed later.
states (list) – A list of last sampled states of all simulaitons in the previous iteration.
wl_delta (list) – A list of fina Wang-Landau incrementors of all simulations.
weights (list) – A list of lists final weights of all simulations.
counts (list, Optional) – A list of lists final counts of all simulations. If the value is
None
, then the MDP parameterinit-histogram-counts
won’t be specified in the next iteration. Note that this parameter is only supported by GROMACS with versions later than 2022.3.
- Returns
MDP – A
gmx_parser.MDP
object that can be used to write the MDP file.- Return type
gmx_parser.MDP
obj
- extract_final_dhdl_info(dhdl_files)[source]
Extracts the last sampled states for all replica simulations.
- Parameters
dhdl_files (list) – A list of file paths to GROMACS DHDL files of different replicas. Note that the order of the files should be consistent with the order of the replicas.
- Returns
states – A list of the global state indices of the last sampled states of all simulaitons.
- Return type
- extract_final_log_info(log_files)[source]
Extracts the following information for each replica simulation from a given list of GROMACS LOG files:
The final Wang-Landau incrementors.
The final lists of weights.
The final lists of counts.
Note that the order of the files should be consistent with the order of the replicas.
- Parameters
log_files (list) – A list of file paths to GROMACS LOG files of different replicas.
- Returns
wl_delta (list) – A list of final Wang-Landau incrementors of all simulations.
weights (list) – A list of lists of final weights of all simulations.
counts (list) – A list of lists of final counts of all simulations.
- static identify_swappable_pairs(states, state_ranges, neighbor_exchange, add_swappables=None)[source]
Identifies swappable pairs. By definition, a pair of simulation is considered swappable only if their last sampled states are in the alchemical ranges of both simulations. This is required to ensure that the values of involved ΔH and Δg can always be looked up from the DHDL and LOG files. This also automatically guarantees that the simulations to be swapped have overlapping state sets.
- Parameters
states (list) – A list of the global state indices of the last sampled states of all simulations. This list can be generated by
extract_final_dhdl_info
. Notably, the input list should not be a list that has been updated/modified byget_swapping_pattern
, or the result will be incorrect.state_ranges (list of lists) – A list of global state indices for all replicas. The input list can be a list updated by
get_swapping_pattern
, especially in the case where there is a need to re-identify the swappable pairs after an attempted swap is accepted.neighbor_exchange (bool) – Whether to exchange only between neighboring replicas.
add_swappables (list) – A list of lists that additionally consider states (in global indices) that can be swapped. For example,
add_swappables=[[4, 5], [14, 15]]
means that if a replica samples state 4, it can be swapped with another replica that samples state 5 and vice versa. The same logic applies to states 14 and 15. This parameter is only relevant to MT-REXEE simulations.
- Returns
swappables – A list of tuples representing the simulations that can be swapped.
- Return type
Example
Below is an example where the REXEE simulation is composed of four replicas sampling states 0-3, 1-4, 2-5, and 3-6, respectively. At exchanges, these replicas are respectively at states 2, 3, 3, and 4. Therefore, the swappable pairs are [(0, 1), (1, 2), (2, 3)]. If only neighboring swaps are considered, the swappable pairs will be [(1, 2), (2, 3)].
>>> from ensemble_md.replica_exchange_EE import ReplicaExchangeEE as REXEE >>> states = [2, 3, 2, 5] >>> state_ranges = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]] >>> REXEE.identify_swappable_pairs(states, state_ranges, neighbor_exchange=False) [(0, 1), (1, 2), (2, 3)] >>> REXEE.identify_swappable_pairs(states, state_ranges, neighbor_exchange=True) [(1, 2), (2, 3)]
- static propose_swap(swappables)[source]
Proposes a swap of coordinates between replicas by drawing a pair from the list of swappable pairs.
- Parameters
swappables (list) – A list of tuples representing the simulations that can be swapped.
- Returns
swap – A tuple of simulation indices to be swapped. If there is no swappable pair, an empty list will be returned.
- Return type
tuple or an empty list
- get_swapping_pattern(dhdl_files, states)[source]
Generates a list (
swap_pattern
) that represents how the configurations should be swapped in the next iteration. The indices of the output list correspond to the simulation/replica indices, and the values represent the configuration indices in the corresponding simulation/replica. For example, if the swapping pattern is[0, 2, 1, 3]
, it means that in the next iteration, replicas 0, 1, 2, 3 should sample configurations 0, 2, 1, 3, respectively, where configurations 0, 1, 2, 3 here are defined as whatever configurations are in replicas 0, 1, 2, 3 in the CURRENT iteration (not iteration 0), respectively.Notably, when this function is called (e.g., once every iteration in a REXEE simulation), the output list
swap_pattern
is always initialized as[0, 1, 2, 3, ...]
and gets updated once every attempted swap. This is different from the attributeconfigs
, which is only initialized at the very beginning of the entire REXEE simulation (iteration 0), thoughconfigs
also gets updated withswap_pattern
once every attempted swap in this function.- Parameters
dhdl_files (list) – A list of paths to the DHDL files. The indicies in the DHDL filenames should be in an ascending order, e.g.
[dhdl_0.xvg, dhdl_1.xvg, ..., dhdl_N.xvg]
.states (list) – A list of last sampled states (in global indices) of ALL simulations.
states[i]=j
means that the configuration in replicai
is at statej
at the time when the exchange is performed. This list can be generatedextract_final_dhdl_info
.
- Returns
swap_pattern (list) – A list showing the configuration of replicas after swapping.
swap_list (list) – A list of tuples showing the accepted swaps.
- calc_prob_acc(swap, dhdl_files, states, shifts)[source]
Calculates the acceptance ratio for swapping simulations.
- Parameters
swap (tuple) – A tuple of indices corresponding to the simulations to be swapped.
dhdl_files (list) – A list of DHDL files, e.g.
dhdl_files = ['dhdl_2.xvg', 'dhdl_1.xvg', 'dhdl_0.xvg', 'dhdl_3.xvg']
means that configurations 2, 1, 0, and 3 are now in replicas 0, 1, 2, 3. This can happen in multiple swaps when a previous swap between configurations 0 and 2 has just been accepted. Otherwise, the list of filenames should always be in the ascending order, e.g.,['dhdl_0.xvg', 'dhdl_1.xvg', 'dhdl_2.xvg', dhdl_3.xvg]
.states (list) – A list of last sampled states (in global indices) in the DHDL files corresponding to configurations 0, 1, 2, … (e.g.,
dhdl_0.xvg
,dhdl_1.xvg
,dhdl_2.xvg
, …) This list can be generated byextract_final_dhdl_info
.shifts (list) – A list of state shifts for converting global state indices to the local ones. Specifically,
states
substracted byshifts
should be the local state indices of the last sampled states indhdl_files[0]
,dhdl_files[1]
, … (which importantly, are not necessarilydhdl_0.xvg
,dhdl_1.xvg
, …). If multiple swaps are not used, then this should always belist(REXEE.s * np.arange(REXEE.n_sim))
.
- Returns
prob_acc – The acceptance ratio.
- Return type
- accept_or_reject(prob_acc)[source]
Returns a boolean variable indicating whether the proposed swap should be acceepted or not given the acceptance ratio.
- Parameters
prob_acc (float) – The acceptance ratio.
- Returns
swap_bool – A boolean variable indicating whether the swap should be accepted.
- Return type
- get_averaged_weights(log_files)[source]
For each replica, calculates the averaged weights (and the associated error) from the time series of the weights since the previous update of the Wang-Landau incrementor. This is only relevant for weight-updating REXEE simulations.
- Parameters
log_files (list) – A list of file paths to GROMACS LOG files of different replicas.
- Returns
weights_avg (list) – A list of lists of weights averaged since the last update of the Wang-Landau incrementor. The length of the list should be the number of replicas.
weights_err (list) – A list of lists of errors corresponding to the values in
weights_avg
.
- weight_correction(weights, counts)[source]
Adjusts the lambda weights based on the histogram counts by using the following equation: \(g_k' = g_k + ln(N_{k-1}/N_k)\), where \(g_k\) and \(g_k'\) are the lambda weight before and after the correction, respectively. Notably, in any of the following situations, we don’t do any correction.
Either \(N_{k-1}\) or \(N_k\) is \(0\).
Either \(N_{k-1}\) or \(N_k\) is smaller than the histogram cutoff specified by
N_cutoff
in the input YAML file.
- Parameters
weights (list) – A list of lists of weights (of ALL simulations) to be corrected. The i-th element corresponds to the list of weights of the i-th replica.
counts (list) – A list of lists of counts (of ALL simulations).
- Returns
weights – An updated list of lists of corected weights.
- Return type
- histogram_correction(hist, print_values=True)[source]
Adjusts the histogram counts. For example, if replicas A and B both sample states 1 and 2 and have histogram counts \(N^A_1\), \(N^A_2\), \(N^B_1\), and \(N^B_2\), the corrected histogram counts for states 1 and 2 for BOTH replicas will be adjusted according to the following equation: \(N_1'/N_2'=((N_1^A N_1^B)/(N_2^A N_2^B))^{1/2}\). Namely, the ratio of the corrected histogram counts for adjacent states is the geometric mean of the ratio of the original histogram counts for the same states. Note that if any histogram count is 0, histogram correction will not be performed and the original histogram counts will be returned.
- Parameters
hist (list) – A list of lists of histogram counts of ALL simulations. The i-th element corresponds to the list of histogram counts of the i-th replica.
print_values (bool, Optional) – Whether to print the histogram counts for each replica before and after histogram correction. The default is
True
.
- Returns
hist_modified – A list of lists of modified histogram counts of ALL simulations.
- Return type
- combine_weights(weights, weights_err=None, print_values=True)[source]
Combines alchemical weights across multiple replicas. Note that if
weights_err
is provided, inverse-variance weighting will be used. Care must be taken since inverse-variance weighting can lead to slower convergence if the provided errors are not accurate. (See 5.1. Weight combination for more details.)- Parameters
weights (list) – A list of lists of alchemical weights of ALL simulations. The i-th element corresponds to the list of weights of the i-th replica.
weights_err (list, Optional) – A list of lists of errors corresponding to the values in
weights
. The default isNone
.print_values (bool, Optional) – Whether to print the weights for each replica before and after weight combination for each replica. The default is
True
.
- Returns
weights_modified (list) – A list of modified alchemical weights of ALL simulations.
g_vec (numpy.ndarray) – An array of alchemical weights of the whole range of states.
- run_REXEE(n, swap_pattern=None)[source]
Performs one iteration of a REXEE simulation, which includes generating the TPR files using the GROMACS
grompp
command and running the expanded ensemble simulations in parallel using the GROMACSmdrun
command. The GROMACS commands are launched as subprocesses. The function assumes that the GROMACS executable is available.- Parameters
n (int) – The iteration index (starting from 0).
swap_pattern (list, Optional) – A list generated by
get_swapping_pattern
. It represents how the replicas should be swapped. This parameter is not needed only ifn
is 0. The default isNone
.