.. _doc_cli: 1. Command-line interface (CLI) =============================== :code:`ensemble_md` provides three command-line interfaces (CLI), including :code:`explore_REXEE`, :code:`run_REXEE` and :code:`analyze_REXEE`. :code:`explore_REXEE` helps the user to figure out possible combinations of REXEE parameters, while :code:`run_REXEE` and :code:`analyze_REXEE` can be used to perform and analyze REXEE simulations, respectively. Below we provide more details about each of these CLIs. .. _doc_explore_REXEE: 1.1. CLI :code:`explore_REXEE` ------------------------------ Here is the help message of :code:`explore_REXEE`: :: usage: explore_REXEE [-h] -N N [-r R] [-n N] [-s S] [-c] [-e] This CLI explores the parameter space of a homogenous REXEE simulation to help you figure out all possible combinations of the number of replicas, the number of states in each replica, and the number of overlapping states, given the total number of states. optional arguments: -h, --help show this help message and exit -N N, --N N The total number of states of the REXEE simulation. -r R, --r R The number of replicas that compose the REXEE simulation. -n N, --n N The number of states per replica. -s S, --s S The state shift between adjacent replicas. -c, --cnst Whether to apply the constraint such that the number of overlapping states does not exceed 50% of the number of states in adjacent replicas. (Default: False) -e, --estimate Whether to provide estimates of the chance of not having any swappable pairs for each solution. (Default: False) 1.2. CLI :code:`run_REXEE` -------------------------- Here is the help message of :code:`run_REXEE`: :: usage: run_REXEE [-h] [-y YAML] [-c CKPT] [-g G_VECS] [-o OUTPUT] [-m MAXWARN] This CLI runs a REXEE simulation given necessary inputs. optional arguments: -h, --help show this help message and exit -y YAML, --yaml YAML The file path of the input YAML file that contains REXEE parameters. (Default: params.yaml) -c CKPT, --ckpt CKPT The file path of the NPY file containing the replica-space trajectories. This file is a necessary checkpoint file for extending the simulation. (Default: rep_trajs.npy) -g G_VECS, --g_vecs G_VECS The file path of the NPY file containing the time series of the whole-range alchemical weights. This file is a necessary input if one wants to update the file when extending a weight- updating simulation. (Default: g_vecs.npy) -e EQUIL, --equil EQUIL The file path of the NPY file containing the equilibration times for all simulations when completing a variable weight REXEE simulation. (Default: equil.npy) -o OUTPUT, --output OUTPUT The file path of the output file for logging how replicas interact with each other. (Default: run_REXEE_log.txt) -m MAXWARN, --maxwarn MAXWARN The maximum number of warnings in parameter specification to be ignored. (Default: 0) In our current implementation, it is assumed that all replicas of a REXEE simulation are performed in parallel using MPI. Naturally, performing a REXEE simulation using :code:`run_REXEE` requires a command-line interface to launch MPI processes, such as :code:`mpirun` or :code:`mpiexec`. For example, on a 128-core node in a cluster, one may use :code:`mpirun -np 4 run_REXEE` (or :code:`mpiexec -n 4 run_REXEE`) to run a REXEE simulation composed of 4 replicas with 4 MPI processes. Note that in this case, it is often recommended to explicitly specify more details about resources allocated for each replica. For example, one can specify :code:`{'-nt': 32}` for the REXEE parameter :code:`runtime_args` in the input YAML file (see :ref:`doc_REXEE_parameters`), so each of the 4 replicas will use 32 threads (assuming thread-MPI GROMACS), taking full advantage of 128 cores. 1.3. CLI :code:`analyze_REXEE` ------------------------------ Finally, here is the help message of :code:`analyze_REXEE`: :: usage: analyze_REXEE [-h] [-y YAML] [-o OUTPUT] [-rt REP_TRAJS] [-st STATE_TRAJS] [-sts STATE_TRAJS_FOR_SIM] [-d DIR] [-m MAXWARN] This CLI analyzes a REXEE simulation. optional arguments: -h, --help show this help message and exit -y YAML, --yaml YAML The file path of the input YAML file used to run the REXEE simulation. (Default: params.yaml) -o OUTPUT, --output OUTPUT The file path of the output log file that contains the analysis results of REXEE. (Default: analyze_REXEE_log.txt) -rt REP_TRAJS, --rep_trajs REP_TRAJS The file path of the NPY file containing the replica-space trajectory. (Default: rep_trajs.npy) -st STATE_TRAJS, --state_trajs STATE_TRAJS The file path of the NPY file containing the stitched state- space trajectory. If the specified file is not found, the code will try to find all the trajectories and stitch them. (Default: state_trajs.npy) -sts STATE_TRAJS_FOR_SIM, --state_trajs_for_sim STATE_TRAJS_FOR_SIM The file path of the NPY file containing the stitched state- space time series for different state sets. If the specified file is not found, the code will try to find all the time series and stitch them. (Default: state_trajs.npy) -d DIR, --dir DIR The path of the folder for storing the analysis results. (Default: analysis) -m MAXWARN, --maxwarn MAXWARN The maximum number of warnings in parameter specification to be ignored. (Default: 0) 2. Implemented workflow ======================= In this section, we introduce the workflow implemented in the CLI :code:`run_REXEE` that can be used to launch REXEE simulations. While this workflow is made as flexible as possible, interested users can use functions defined :class:`ReplicaExchangeEE` to develop their own workflow, or consider contributing to the source code of the CLI :code:`run_REXEE`. As an example, a hands-on tutorial that uses the CLI :code:`run_REXEE` can be found in `Tutorial 1: Launching a REXEE simulation`_. .. _`Tutorial 1: Launching a REXEE simulation`: examples/run_REXEE.ipynb Step 1: Set up parameters ------------------------- To run a REXEE simulation in GROMACS using the CLI :code:`run_REXEE`, one at least needs the following four files. (Check :ref:`doc_input_files` for more details.) * One YAML file that specifies REXEE parameters, as specified via the CLI :code:`run_REXEE`. * One GRO file of the system of interest, as specified in the input YAML file. * One TOP file of the system of interest, as specified in the input YAML file. * One MDP template for customizing MDP files for different replicas, as specified in the input YAML file. Note that multiple GRO/TOP files can be provided to initiate different replicas with different configurations/topologies, in which case the number of GRO/TOP files must be equal to the number of replicas. Also, the MDP template should contain parameters shared by all replicas and define the coupling parameters for all intermediate states. Moreover, additional care needs to be taken for specifying some MDP parameters need additional care to be taken, which we describe in :ref:`doc_mdp_params`. Lastly, to extend a REXEE simulation, one needs to additionally provide the following two files (generated by the existing simulation) as necessary checkpoints: * One NPY file containing the replica-space trajectories of different configurations, as specified in the input YAML file. * One NPY file containing the time series of the whole-range alchemical weights, as specified in the input YAML file. This is only needed for extending a weight-updating REXEE simulation. * One NPY file containing the equilibration times for all simulations or -1 if the simulation is not yet equilibrated. This is only needed if extending a weight-updating REXEE simulation. In the CLI :code:`run_REXEE`, the class :class:`.ReplicaExchangeEE` is instantiated with the given YAML file, where the user needs to specify how the replicas should be set up or interact with each other during the simulation ensemble. Check :ref:`doc_parameters` for more details. Step 2: Run the 1st iteration ----------------------------- After setting things up in the previous step, the CLI :code:`run_REXEE` uses the function :obj:`.run_REXEE` to run subprocess calls to launch GROMACS :code:`grompp` and :code:`mdrun` commands in parallel for the first iteration. Step 3: Set up the new iteration -------------------------------- In the CLI :code:`run_REXEE`, this step can be further divided into the following substeps. Step 3-1: Extract the final status of the previous iteration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To calculate the acceptance ratio and modify the MDP files in later steps, we first need to extract the information of the final status of the previous iteration. Specifically, for all the replica simulations, we need to * Find the last sampled state and the corresponding lambda values from the DHDL files, which are necessary for both fixed-weight and weight-updating simulations. * Find the final Wang-Landau incrementors and weights from the LOG files, which are necessary for a weight-updating simulation. These two tasks are done by :obj:`.extract_final_dhdl_info` and :obj:`.extract_final_log_info`. .. _doc_swap_basics: Step 3-2: Identify the swapping pattern ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Given the information of the final status of the previous simulation, the CLI :code:`run_REXEE` runs the function :obj:`.get_swapping_pattern` to figure out how the coordinates should be swapped between replicas. Specifically, the function does the following: - Identify swappable pairs using the function :obj:`.identify_swappable_pairs`. Notably, replicas can be swapped only if the states to be swapped are present in both of the state sets corresponding to the two replicas. This definition automatically implies one necessary but not sufficient condition that the replicas to be swapped should have overlapping state sets. Practically, if the states to be swapped are not present in both state sets, potential energy differences required for the calculation of :math:`\Delta` will not be available, which makes the calculation of the acceptance ratio technically impossible. - Propose a swap using the function :obj:`.propose_swap`. - Calculates the acceptance ratio using :math:`\Delta u` values obtained from the DHDL files using the function :obj:`.calc_prob_acc`. - Use the function :obj:`.accept_or_reject` to draw a random number and compare it with the acceptance ratio to decide whether the swap should be accepted or not. - Propose and evaluate multiple swaps if needed (e.g., when the exhaustive exchange proposal scheme is used), and finally return a list that represents how the configurations should be swapped in the next iteration. For more details, please refer to the API documentation of the involved functions. Step 3-3: Apply correction schemes if needed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For a weight-updating REXEE simulation, correction schemes may be applied if specified. Specifically, the CLI :code:`run_REXEE` applies the weight combination scheme using the function :obj:`.combine_weights` and the histogram correction scheme using the function :obj:`.histogram_correction`. For more details about correction schemes, please refer to the section :ref:`doc_correction`. Step 3-4: Set up the input files for the next iteration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After the final configuration has been figured out by :obj:`.get_swapping_pattern` (and the weights/counts have been adjusted by the specified correction schemes, if any), the CLI :code:`run_REXEE` sets up input files for the next iteration. In principle, the new iteration should inherit the final status of the previous iteration. This means: * For each replica, the input configuration for initializing a new iteration should be the output configuration of the previous iteration. For example, if the final configurations are represented by :code:`[1, 2, 0, 3]` (returned by :obj:`.get_swapping_pattern`), then in the next iteration, replica 0 should be initialized by the output configuration of replica 1 in the previous iteration, while replica 3 can just inherit the output configuration from the previous iteration of the same replica. Notably, instead of exchanging the MDP files, the CLI :code:`run_REXEE` swaps out the coordinate files to exchange replicas, which is equivalent to exchanging the MDP files. * For each replica, the MDP file for the new iteration should be the same as the one used in the previous iteration of the same replica except that parameters like :code:`tinit`, :code:`init_lambda_state`, :code:`init_wl_delta`, and :code:`init_lambda_weights` should be modified to the final values in the previous iteration. In the CLI :code:`run_REXEE`, this is done by :obj:`.update_MDP`. Step 4: Run the new iteration ----------------------------- After the input files for a new iteration have been set up, we use the procedure in Step 2 to run a new iteration. Then, the CLI :code:`run_REXEE` loops between Steps 3 and 4 until the desired number of iterations (:code:`n_iterations` specified in the input YAML file) is reached. .. _doc_parameters: 3. Input YAML parameters ======================== In the current implementation of the algorithm, 30 parameters can be specified in the input YAML file. Note that the two CLIs :code:`run_REXEE` and :code:`analyze_REXEE` share the same input YAML file, so we also include parameters for data analysis here. 3.1. GROMACS executable ----------------------- - :code:`gmx_executable`: (Optional, Default: :code:`'gmx_mpi'`) The GROMACS executable to be used to run the REXEE simulation. The value could be as simple as :code:`gmx` or :code:`gmx_mpi` if the executable has been sourced. Otherwise, the full path of the executable (e.g., :code:`/usr/local/gromacs/bin/gmx`, the path returned by the command :code:`which gmx`) should be used. Currently, our implementation only works with thread-MPI GROMACS. An implementation that works with MPI-enabled GROMACS will be released soon. (Check `Issue 20`_ for the current progress.) .. _`Issue 20`: https://github.com/wehs7661/ensemble_md/issues/20 .. _doc_input_files: 3.2. Input files ---------------- - :code:`gro`: (Required) The path of the input system configuration in the form of GRO file(s) used to initiate the REXEE simulation. If only one GRO file is specified, it will be used to initiate all the replicas. If multiple GRO files are specified (using the YAML syntax), the number of GRO files has to be the same as the number of replicas. - :code:`top`: (Required) The path of the input system topology in the form of TOP file(s) used to initiate the REXEE simulation. If only one TOP file is specified, it will be used to initiate all the replicas. If multiple TOP files are specified (using the YAML syntax), the number of TOP files has to be the same as the number of replicas. In the case where multiple TOP and GRO files are specified, the i-th TOP file corresponds to the i-th GRO file. - :code:`mdp`: (Required) The path of the input MDP file used to initiate the REXEE simulation. Specifically, this input MDP file will serve as a template for customizing MDP files for all replicas. Therefore, the MDP template must specify the whole range of :math:`λ` values and :math:`λ`-relevant parameters. This holds for REXEE simulations for multiple serial mutations as well. For example, in a REXEE simulation that mutates methane to ethane in one replica and ethane to propane in the other replica, if exchanges only occur in the end states, then one could have :math:`λ` values like :code:`0.0 0.3 0.7 1.0 0.0 0.3 ...`. Notably, unlike the parameters :code:`gro` and :code:`top`, only one MDP file can be specified for the parameter :code:`mdp`. If you wish to use different parameters for different replicas, please use the parameter :code:`mdp_args`. - :code:`modify_coords`: (Optional, Default: :code:`None`) The file path of the Python module for modifying the output coordinates of the swapping replicas before the coordinate exchange, which is generally required in multi-topology REXEE simulations. For the CLI :code:`run_REXEE` to work, here is the predefined contract for the module/function based on the assumptions :code:`run_REXEE` makes. Modules/functions not obeying the contract are unlikely to work. - Multiple functions can be defined in the module, but the function for coordinate manipulation must have the same name as the module itself. - The function must only have two compulsory arguments, which are the two GRO files to be modified. The function must not depend on the order of the input GRO files. - The function must return :code:`None` (i.e., no return value). - The function must save the modified GRO file as :code:`confout.gro`. Specifically, if :code:`directory_A/output.gro` and :code:`directory_B/output.gro` are input, then :code:`directory_A/confout.gro` and :code:`directory_B/confout.gro` must be saved. (For more information, please visit `Tutorial 3: Multi-topology REXEE (MT-REXEE) simulations`_.) Note that in the CLI :code:`run_REXEE`, :code:`confout.gro` generated by GROMACS will be automatically renamed with a :code:`_backup` suffix to prevent overwriting. .. _`Tutorial 3: Multi-topology REXEE (MT-REXEE) simulations`: examples/run_REXEE_modify_inputs.ipynb .. _doc_REXEE_parameters: 3.3. REXEE parameters --------------------- - :code:`n_sim`: (Required) The number of replica simulations. - :code:`n_iter`: (Required) The number of iterations. In a REXEE simulation, one iteration means one exchange interval, which can involve multiple proposed swaps (if the exhaustive exchange proposal scheme is used). Note that when extending a simulation is desired and the necessary checkpoint files are provided, this parameter takes into account the number of iterations that have already been performed. That is, if a simulation has already been performed for 100 iterations, and one wants to extend it for 50 more iterations, then the value of this parameter should be 150. - :code:`s`: (Required) The shift in the state sets between adjacent replicas. For example, if replica 1 samples states 0, 1, 2, 3 and replica 2 samples states 2, 3, 4, 5, then :code:`s = 2` should be specified. - :code:`nst_sim`: (Optional, Default: :code:`nsteps` in the template MDP file) The exchange period, i.e., the number of simulation steps to carry out for one iteration. The value specified here will overwrite the :code:`nsteps` parameter in the MDP file of each iteration. Note that this option assumes replicas with homogeneous simulation lengths. - :code:`add_swappables`: (Optional, Default: :code:`None`) A list of lists that additionally consider states (in global indices) that can be swapped. For example, :code:`add_swappables=[[4, 5], [14, 15]]` means that if a replica samples state 4, it can be swapped with another replica that samples state 5 and vice versa. The same logic applies to states 14 and 15. This could be useful for multi-topology REXEE (MT-REXEE) simulations, where we enforce the consideration of exchanges between states 4 and 5 (and 14 and 15) and perform coordinate manipulation when necessary. - :code:`proposal`: (Optional, Default: :code:`'exhaustive'`) The method for proposing simulations to be swapped. Available options include :code:`single`, :code:`neighboring`, and :code:`exhaustive`. For more details, please refer to :ref:`doc_proposal`. - :code:`w_combine`: (Optional, Default: :code:`False`) Whether to perform weight combination or not. Note that weights averaged over from the last update of the Wang-Landau incrementor (instead of the final weights) will be used for weight combination. For more details about, please refer to :ref:`doc_w_schemes`. - :code:`w_mean_type`: (Optional, Default: :code:`'simple'`) The type of mean to use when combining weights. Available options include :code:`simple` and :code:`weighted`. For the latter case, inverse-variance weighted means are used. For more details about, please refer to :ref:`doc_w_schemes`. - :code:`N_cutoff`: (Optional, Default: -1) The histogram cutoff for weight corrections. A cutoff of 1000 means that weight corrections will be applied only if the counts of the involved states are both larger than 1000. A value of -1 means that no weight correction will be performed. For more details, please please refer to :ref:`doc_weight_correction`. - :code:`hist_corr` (Optional, Default: :code:`False`) Whether to perform histogram correction. For more details, please refer to :ref:`doc_hist_correction`. - :code:`mdp_args`: (Optional, Default: :code:`None`) A dictionary that contains MDP parameters differing across replicas. For each key in the dictionary, the value should always be a list of the length of the number of replicas. For example, :code:`{'ref_p': [1.0, 1.01, 1.02, 1.03]}` means that the MDP parameter :code:`ref_p` will be set as 1.0 bar, 1.01 bar, 1.02 bar, and 1.03 bar for replicas 0, 1, 2, and 3, respectively. Note that while this feature allows high flexibility in parameter specification, not all parameters are suitable to be varied across replicas. Users should use this parameter with caution, as there is no check for the validity of the MDP parameters. Additionally, this feature is a work in progress and differing :code:`ref_t` or :code:`dt` across replicas would not work. - :code:`grompp_args`: (Optional: Default: :code:`None`) A dictionary that contains additional arguments to be appended to the GROMACS :code:`grompp` command. For example, one could have :code:`{'-maxwarn', '1'}` to specify the :code:`maxwarn` argument for the :code:`grompp` command. - :code:`runtime_args`: (Optional, Default: :code:`None`) A dictionary that contains additional runtime arguments to be appended to the GROMACS :code:`mdrun` command. For example, one could have :code:`{'-nt': 16}` to run the simulation using tMPI-enabled GROMACS with 16 threads. 3.4. Output settings -------------------- - :code:`verbose`: (Optional, Default: :code:`True`) Whether a verbose log file is desired. - :code:`n_ckpt`: (Optional, Default: 100) The number of iterations between each checkpoint. Specifically, the CLI :code:`run_REXEE` will save the replica-space trajectories and the time series of the whole-range alchemical weights (in a weight-updating simulation) every :code:`n_ckpt` iterations. This is useful for extending a simulation. - :code:`rm_cpt`: (Optional, Default: :code:`True`) Whether the GROMACS checkpoint file (:code:`state.cpt`) from each iteration should be deleted. Normally we don't need GROMACS CPT files for REXEE simulations (even for extension) so we recommend just deleting the CPT files (which could save a lot of space if you perform a huge number of iterations). If you wish to keep them, specify this parameter as :code:`False`. .. _doc_analysis_params: 3.5. Data analysis ------------------ Note that parameters in this section are ignored by the CLI :code:`run_REXEE` and are only considered by the CLI :code:`analyze_REXEE`. - :code:`msm`: (Optional, Default: :code:`False`) Whether to build Markov state models (MSMs) for the REXEE simulation and perform relevant analysis. - :code:`free_energy`: (Optional, Default: :code:`False`) Whether to perform free energy calculations or not. - :code:`subsampling_avg`: (Optional, Default: :code:`False`) Whether to take the arithmetic average of the truncation fractions and the geometric average of the statistical inefficiencies over replicas when subsampling data for free energy calculations. For systems where the sampling is challenging, the truncation fraction or statistical inefficiency may vary largely across state sets, in which case this option could be useful. - :code:`df_spacing`: (Optional, Default: 1) The spacing (in the number of data points) to consider when subsampling the data, which is assumed to be the same for all replicas. - :code:`df_ref`: (Optional, Default: :code:`None`) The reference free energy profile for the whole range of states. The input should be a list having the length of the total number of states. - :code:`df_method`: (Optional, Default: :code:`'MBAR'`) The free energy estimator to use in free energy calculations. Available options include :code:`'TI'`, :code:`'BAR'`, and :code:`'MBAR'`. - :code:`err_method`: (Optional, Default: :code:`'propagate'`) The method for estimating the uncertainty of the free energy combined across multiple replicas. Available options include :code:`'propagate'` and :code:`'bootstrap'`. The bootstrapping method is more accurate but much more computationally expensive than simple error propagation. - :code:`n_bootstrap`: (Optional, Default: 50) The number of bootstrap iterations to perform when estimating the uncertainties of the free energy differences. - :code:`seed`: (Optional, Default: :code:`None`) The random seed to use in bootstrapping. 3.6. A template input YAML file ------------------------------- For convenience, here is a template of the input YAML file, with each optional parameter specified with the default and required parameters left blank. Note that specifying :code:`null` is the same as leaving the parameter unspecified (i.e., :code:`None`). .. code-block:: yaml # Section 1: Runtime configuration gmx_executable: 'gmx_mpi' # Section 2: Input files gro: top: mdp: modify_coords: null # Section 3: REXEE parameters n_sim: n_iter: s: nst_sim: null add_swappables: null proposal: 'exhaustive' w_combine: False w_mean_type: 'simple' N_cutoff: 1000 hist_corr: False mdp_args: null grompp_args: null runtime_args: null # Section 4: Output settings verbose: True n_ckpt: 100 rm_cpt: True # Section 5: Data analysis msm: False free_energy: False subsampling_avg: False df_spacing: 1 df_ref: null df_method: 'MBAR' err_method: 'propagate' n_bootstrap: 50 seed : null .. _doc_mdp_params: 4. Input MDP parameters ======================= As mentioned above, a template MDP file should have all the parameters commonly shared across all replicas. It should also define the coupling parameters for the whole range of states so that different MDP files can be customized for different replicas. For a REXEE simulation launched by the CLI :code:`run_REXEE`, any GROMACS MDP parameter that could potentially lead to issues in the REXEE simulation will raise a warning. If the number of warnings is larger than the value specified for the flag :code:`-m`/:code:`--maxwarn` in the CLI :code:`run_REXEE`, the simulation will error out. To avoid warnings arisen from MDP specification, users need to take extra care for the following MDP parameters: - We recommend setting :code:`lmc_seed = -1` so that a different random seed for Monte Carlo moves in the state space will be used for each iteration. - We recommend setting :code:`gen_vel = yes` to re-generate new velocities for each iteration to avoid potential issues with detailed balance. - We recommend setting :code:`gen_seed = -1` so that a different random seed for velocity generation will be used for each iteration. - The MDP parameter :code:`nstlog` must be a factor of the YAML parameter :code:`nst_sim` so that the final status of the simulation can be correctly parsed from the LOG file. - The MDP parameter :code:`nstdhdl` must be a factor of the YAML parameter :code:`nst_sim` so that the time series of the state index can be correctly parsed from the DHDL file. - In REXEE, the MDP parameter :code:`nstdhdl` must be a factor of the MDP parameter :code:`nstexpanded`, or the calculation of the acceptance ratio may be wrong. - Be careful with the pull code specification if you want to apply a distance restraint between two pull groups. Specifically, in a REXEE simulation, all iterations should use the same reference distance. Otherwise, poor sampling can be observed in a fixed-weight REXEE simulation and the equilibration time may be much longer for a weight-updating REXEE simulation. To ensure the same reference distance across all iterations in a REXEE simulation, consider the following scenarios: - If you would like to use the COM distance between the pull groups in the input GRO file as the reference distance for all the iterations (whatever that value is), then specify :code:`pull_coord1_start = yes` with :code:`pull_coord1_init = 0` in your input MDP template. In this case, :obj:`.update_MDP` will parse :code:`pullx.xvg` from the first iteration to get the initial COM distance (:code:`d`) and use it as the reference distance for all the following iterations using :code:`pull_coord1_start = no` with :code:`pull_coord1_init = d`. Note that this implies that the MDP parameter :code:`pull_nstxout` should not be 0. - If you want to explicitly specify a reference distance (:code:`d`) to use for all iterations, simply use :code:`pull_coord1_start = no` with :code:`pull_coord1_init = d` in your input MDP template. 5. Some rules of thumb ====================== Here are some rules of thumb for specifying some key YAML parameters, as discussed/concluded from our paper [Hsu2024]_. - **Number of replicas** (:code:`n_sim`): Just like other replica exchange methods, it is generally recommended that the number of replicas be a factor of available computational resources, such as the number of CPU cores. For example, if you have 128 CPU cores, you may consider using 4, 8, 16, or 32 replicas. - **Total number of states**: One advantage of the REXEE method over other replica exchange methods is that it completely decouples the number of replicas from the number of states. Therefore, once the number of replicas is decided, the total number of states can be arbitrary. Still, just like other generalized ensemble methods, the total number of states should be large enough to ensure sufficient overlap between adjacent states, but not too large to make the simulation computationally expensive. - **Number of states per replica**/**State shift**: After deciding the total number of states and the number of replicas, one can use the CLI :code:`explore_REXEE` to list all possible REXEE configurations, from which one can decide the number of states per replica or the state shift depending on how much overlap is desired between adjacent replicas. For example, if one has decided to use 4 replicas to sample 12 alchemical intermediate states, one can run :code:`explore_REXEE -N 12 -r 4`, which returns the following: .. code-block:: Exploration of the REXEE parameter space ======================================= [ REXEE parameters of interest ] - N: The total number of states - r: The number of replicas - n: The number of states for each replica - s: The state shift between adjacent replicas [ Solutions ] - Solution 1: (N, r, n, s) = (12, 4, 6, 2) - Replica 0: [0, 1, 2, 3, 4, 5] - Replica 1: [2, 3, 4, 5, 6, 7] - Replica 2: [4, 5, 6, 7, 8, 9] - Replica 3: [6, 7, 8, 9, 10, 11] - Solution 2: (N, r, n, s) = (12, 4, 9, 1) - Replica 0: [0, 1, 2, 3, 4, 5, 6, 7, 8] - Replica 1: [1, 2, 3, 4, 5, 6, 7, 8, 9] - Replica 2: [2, 3, 4, 5, 6, 7, 8, 9, 10] - Replica 3: [3, 4, 5, 6, 7, 8, 9, 10, 11] Generally, a higher overlap would lead to a higher sampling efficiency in both replica-space and state-space sampling, but will also lead to a higher computational cost. However, the influence of different overlaps on the accuracy of free energy calculations in a fixed-weight REXEE simulation is usually negligible. If one decides to use "Solution 1" in the above example, then the YAML parameters :code:`n_sim` and :code:`s` should be set as 4, and 2, respectively. Note that the total number of states should be reflected in the input MDP templated (specified through the YAML parameter :code:`mdp`) and the number of states per replica will be automatically calculated by the CLI :code:`run_REXEE` (given the other three REXEE configurational parameters) when customizing MDP files for different replicas. - **Exchange period** (:code:`nst_sim`): Generally, a higher swapping frequency (i.e., lower exchange period) would lead to a higher sampling efficiency in both replica-space and state-space sampling, as well as higher accuracy in free energy calculations. However, it would also lead to a higher computational cost. According to our experience, an exchange period between 500 to 2000 steps is usually a good choice for most systems. - **Number of iterations**: After deciding the exchange period, the number of iterations should be decided only based on the desired effective simulation length. For example, for a REXEE simulation running 4 replicas with an exchange period of 1000 steps (or 2 ps given a 2 fs time step), one may consider running 12500 iterations to achieve a total simulation length of 100 ns. - **Correction schemes**: As discussed in our paper, for a weight-updating REXEE simulation, there has been no evidence showing any advantages of using any implemented correction schemes, including weight combination, weight correction, and histogram correction schemes, which can be enabled by YAML parameters :code:`w_combine`, :code:`N_cutoff`, and :code:`hist_corr`, respectively. To converge alchemical weights, we recommend just using weight-updating EE simulations, or weight-updating REXEE simulations without any correction schemes, i.e., using default values for these parameters.