User manual for LigBuilder v2.0

title.gif (27521 bytes)

How to manipulate BUILD?

Introduction
Parameter sets
Basic parameter
Advanced parameter
Result analysis

1. Introduction

The major function of BUILD is constructing ligand molecules for the target protein by applying the fragment-based design strategy. All the molecules are developed and evolved with a Genetic Algorithm procedure. The synthesis accessibility of designed compounds can be analyzed with the aid of an embedded chemical reaction database and a retro-synthesis analyzer. Moreover, using a set of evaluation rules, ligands can be designed to best fit the identified cavities. Drug-like and privileged fragments can be used to construct ligands with the aid of internal and external absorption, distribution, metabolism, excretion, and toxicity (ADME/T) and drug-like filters. In addition to these novel functions, BUILD inherited all the existing algorithms, features, and libraries used in GROW and LINK modules of LigBuilder v1.2. All resultant molecules will be collected in a file and finally bring out a design report.

   Please notice:
    1.BUILD rely on the binding-site describing file generated by CAVITY. Therefore, you should obtain the result of CAVITY first.
    2.Although BUILD is a stand-alone program, it will take many months to finish your task if running on a single CPU. User may run a number of BUILD processes simultaneously for accelerating. The accelerating effect is near linear dependency with the number of processes. Furthermore, a "pseudo parallel" trick was applied by the BUILD in automatic mode to simplify the operation. If running on a mainstream workstation, a typical design task takes about 1~5K CPU hours. The running time will increase if the ligandability of the binding site is much lower.
   3.BUILD will consume a lot of memory, a typical BUILD process will use 1~2G memory, and user may control the memory consuming by loading different memory control parameter sets.

Synopsis of running BUILD:

build -Function Parameter_file [Id]

For example:

./build -Automatic build.input

The parameter file assembles all the information necessary to run BUILD as you wish. You should edit the parameter file before running BUILD. The key words appeared in this parameter file are explained in the following.

2. Parameter sets

For convenience, we provide some default parameter sets to adapt different task, user may simply loaded them into your parameter file by the keyword "INCLUDE" . They are under the path "LigBuilderV2/default/", including following default sets:

Overall default set:
default.input: The overall default parameter set of BUILD. Including all default settings of BUILD.

Default path:
path.input: The default path of all input and output files. You'd better avoid from modifying them directly, because the matching file names are important for BUILD to run under automatic mode. User may set the path by setting the predefine name.

Diversity set:
diversity_low/moderate/high.input: The default sets of molecule diversity. Increase the diversity may decrease the synthesis-accessibility. Default: Low.

Synthesize set:
synthesize_knowledge/rapid/moderate/accurate.input: The default sets for balancing the synthesis speed and accuracy. The embedded retro-synthesis analyzer is extremely resource consuming if we do not constrain the analysis accuracy. According to our experience, a rough synthesis accessibility estimation go with the design process is much more effective than an accurate strategy. Although a rapid strategy will decrease the reliability of the estimation, we could dispel this affection by the post analysis based-on the huge amounts of the design results. Knowledge strategy will provide a basic estimation, and it is very useful if your computer resource is limited. Default: Rapid.

Optimization set:
optimize_none/soft/moderate/intense.input: The default sets of molecular optimization intension. Default: Intense(Exploring), soft(Growing/Linking).

     Memory control set:
       design-memory_500M/1G/2G.input: The parameter set of maximal design memory. Increase the maximal memory to enhance the genetic algorithm. Default 1G/per session.
       synthesize-memory_600M/1G/2G.input: The parameter set of maximal synthesis analysis memory (embedded analyzer). Increase the maximal memory to enhance the analysis speed. Default 1G/per session.
       analysis-memory_1G/2G/3G.input: The parameter set of maximal synthesis analysis memory (post analyzer). Increase the maximal memory to enhance the analysis speed. Default 1G/per session.
       ATTENTION: The maximal memory of A SESSION is equal to the summation of the design memory and synthesize memory of embedded analyzer (1st + 2nd term) in the design step, or the maximal synthesis memory of post analyzer (3rd term) in the post-analysis step. Please select appropriate memory allocation solution according to your system resource.

Recommend set:
recommend_redock/inclusiveness/moderate/exactness.input: The default sets of LigBuilder v2.0 recommender. BUILD will output a huge amount of molecules in a design task (1~10 Million molecules, typically), and the recommender will help you to select the most potential molecules. Recommender applied a dynamic estimation equation to refine a large molecules library based-on the statistic of molecules, hence it would not be used to process a small molecules set in the standard mode. Nonetheless, we provide some solutions to adapt special cases. User may try the 'moderate' or 'inclusiveness' set if the 'exactness' strategy failed. And if you want to apply recommender to refine the docking result, you may try the 'redock' set.

     Special design strategy set:
       lead.input: The default set of lead design mode. Design lead structure with low molecular weight and high atomic efficiency.
       mimic.input: The default set of mimic design mode. Design mimic inhibitor based on known inhibitors.(Growing/Linking Mode)

Speedup strategy set:
speedup.input: If you have limited computing resources or you want this work finished as soon as possible, we strongly recommend you to use speedup strategy. This strategy will lead to explore a smaller chemical space, so it is much more efficient. AND, this strategy has approximate design quality but less diversity comparing with normal strategy.

Usersettings:
usersettings.input: User may configure your design strategy by setting this file.

3. Basic parameters

DESIGN_MODE: The central function of LigBuilder v2.0 is constructing ligand molecules within the constraints of the target protein. LigBuilder v2.0 supports three strategies to do this, i.e 1: Growing strategy, 2: Linking strategy and 0: Exploring strategy. To apply the growing strategy, you need to provide a pre-placed "seed" structure inside the binding pocket and LigBuilder v2.0 will subsequently add fragments onto it to build molecules. This strategy may be helpful when you have got an interesting lead compound and want to develop its derivatives to improve the bioactivity (lead optimization). To apply the linking strategy, you also need to provide a starting structure, which consists of several separated chemical fragments. These fragments should be pre-placed inside the binding pocket and would better to form favorable interactions with the target protein. Then LigBuilder v2.0 will try to build molecular frameworks to link those fragments into integrated molecules. This strategy may be helpful when you try to merge several key interaction into one ligand(fragment linking). Furthermore, LigBuilder v2.0 provides a fragments extractor to extract "seed" from known inhibitors, and design molecules based-on these fragments. This strategy may be helpful when you try to find mimic compounds. To apply the exploring strategy, the pre-placed seed structure is no longer needed. LigBuilder v2.0 will generate initial seed structure and extracting new seed structures automatically during the design process. This strategy is very powerful in finding novel lead compounds(lead discovery)and designing de novo scaffold and structures(de novo design).

Illustration of design process

POCKET_ATOM_FILE: The corresponding file given by CAVITY.

POCKET_GRID_FILE: The corresponding file given by CAVITY.

SEED_LIGAND_LIST: The list of Mol2 files that present the "seed" structures (DESIGN_MODE:1 or 2). As mentioned above, this is absolutely necessary for growing mode and linking mode because the "seed" is the mother of all the resultant molecules. Therefore, it will be ideal if the seed structure holds the key features of the prospective ligands. LigBuilder v2.0 could generate seed structure from known inhibitors by the "Extract" function, and the inhibitors should be docked well or extract from the crystal structure directly. For more instruction of how to prepare a seed structure, please refer to the Advanced Skills section.

ADD_HSPC: YES: Treat all hydrogen atoms of the seed structure as the growing site.. NO: Only use user-defined growing sites. (DESIGN_MODE:1 or 2):

Then, there are the definitions of automatic mode, it would apply a pseudo-parallel trick:

CONTINUE_MODE: YES : Continue with previous seeds and results. No : Overwrite results and regenerating seeds

SESSION_NUMBER: The number of sessions you'll run at one time. 50 at least, default 100, the more, the better. If your computing resource is limited, please apply speedup strategy, or ligbuilder may not finish your job within a reasonable time.

MOLECULE_NUMBER: The goal of the design task. 100K at least, default 1M. We suggest that user may design 10M molecules in a task if you have plenty of computing resource. (Reference design speed: 1K compounds / CPU hour.)

4 Advanced parameters

Input Section:

OPTIMIZE_SEED: YES: Optimize the seed structure in the force field of LigBuilder v2.0. NO: Keep the original coordinates. This may cause collision if the seed structure is not assigned well.(DESIGN_MODE:1 or 2):

Output Section:

LIGAND_COLLECTION_FILE : This is the output file of BUILD. During the GA procedure, all the molecules which meet the rules you set will be collected into this file. It is in LigBuilder LIG format so it can be analyzed by "Process" function. Please notice that these molecules will not come only from the final generation but from every generation. It forms a structure-based virtual library for the target protein.

LEAD_COLLECTION_FILE : This file collects seed structures generated by BUILD. It is in Mol2 format.

SEED_RECORD_FILE : This file collects current seed structure library. It is in Mol2 format.

Random seed:

RANDOM_SEED : This is the seed of the random number generator. Natural number: Manually assignment seed. -1: Time dependent seed(default). -2: Id dependent seed (Id needed). Time dependent seed is intrinsic random and it is impossible to be repeated in individual runs even if they are executed in the exactly same environments and starting time. Id dependent will be repeatable with same id. Natural number seed will be repeatable with same number.

Outputting rules:

There are a set of chemical viability rules, including "APPLY_CHEMICAL_RULES", "APPLY_FORBIDDEN_STRUCTURE_CHECK", "APPLY_TOXIC_STRUCTURE_CHECK", "MAXIMAL_MOLECULAR_WEIGHT", "MINIMAL_MOLECULAR_WEIGHT", "MAXIMAL_LOGP", "MINIMAL_LOGP", "MAXIMAL_HB_DONOR_ATOM", "MINIMAL_HB_DONOR_ATOM", "MAXIMAL_HB_ACCEPTOR_ATOM", "MINIMAL_HB_ACCEPTOR_ATOM", "MAXIMAL_PKD", "MINIMAL_PKD" and "MINIMAL_AVER_PKD". To learn what these parameters mean and how to set them, please refer to the Advanced Skills Section.

MAXIMAL_RESULTS: The design goal of each stand-alone session. 0: Unlimited design. Natural number: Stop when collected enough molecules.

Genetic algorithm parameters:

BUILD uses Genetic Algorithm (GA) to develop and evolve molecules. It is run under generational-replacement mode. The rough procedure is like this: (1) Generate the initial population based on the seed structure; (2) Choosing "parents" molecules from the current population into the mating pool; (3) Copy the "elite" of the current population into the new population; (4) Fill out the new population by performing structural manipulations on the molecules in the mating pool. (5) Go to Step 2 until the pre-set number of generations has been reached.

Flow chart of the GA procedure implemented in BUILD

Therefore, there are a few parameters for setting up this GA procedure:

NUMBER_OF_GENERATION: Number of generations (GA circle) to go through. Typically 8~15 generations will be enough.

NUMBER_OF_POPULATION: Size of the population. This parameter determines dominatingly how much memory the program will take while running. Usually it is several thousands.

NUMBER_OF_PARENTS: Number of molecules to be selected into the mating pool. We recommend the ratio of NUMBER_OF_POPULATION / NUMBER_OF_PARENTS should be larger than 10.

SIMILARITY_CUTOFF: Maximal similarity between any two molecules in the mating pool. For instance, if it is set to 1.00, it means duplicates are not allowed to exist. By setting this parameter, you will exert a forced diversity in the mating pool and thus are more likely to get more diverse offspring. The lower this parameter is, the higher diversity will be achieved. We recommend this parameter to be 0.80~1.00.

ELITISM_RATIO: Percentage of "elitism". This parameter could be between 0 and 1. For example, if it is set to 0.10, that means the top 10% of the current population will be copied directly to the new population. By doing so, the elite molecules will not lose by accident. We recommend this parameter to be around 0.10.

GROWING_PROBABILITY: For the current version of BUILD, growing manipulation is the essential function. So please fix this parameter to 1.00.

LINKING_PROBABILITY: For the current version of BUILD, linking manipulation is the essential function. So please fix this parameter to 1.00.

Libraries:

BUILDING_BLOCK_LIBRARY: The path of the building-block fragment library. The default one is "fragment.mdb".

BUILDING_BLOCK_LIBRARY_EXTEND: The path of the extend building-block fragment library. The default one is "fragment.mdb/extend".

BUILDING_BLOCK_LIBRARY_ROTATABLE: The path of the rotatable building-block fragment library. The default one is "fragment.mdb/rotatable".

FORBIDDEN_STRUCTURE_LIBRARY: The path of the forbidden substructure library. The default one is "forbidden.mdb".

TOXIC_STRUCTURE_LIBRARY: The path of the toxic substructure library. The default one is "toxicity.mdb".

SYNTHESIZE_LIBRARY: The path of the synthesize library. The default one is "synthesize.mdb".

If you are a novice to LigBuilder v2.0, you can simply adopt the default settings for the above fragment libraries. If you want to learn how to edit them to meet your own purpose, please refer to the Advanced Skills section.

5 Result analysis

The output files will be named according to the predefined receptor name and result directory.

result/output_name/ligand.lig: Contains the molecules designed by BUILD in LigBuilder LIG format.

result/process_name/INDEX: Listing the molecules extracted by the "Process" function from the LIG file.

result/cluster_name/INDEX: Listing the cluster information generated by the "Cluster" function.

result/synthesize_name/synthesize.log Listing the synthesis analysis information generated by the "Synthesize" function.

result/report_name/report.html: The final report HTML file.

[Content] [Introduction] [Download] [Install] [Overview] [CAVITY] [BUILD] [Skills] [FAQs]

(These web pages are edited by Yaxia Yuan. Latest update: Jan. 2021)