Precise statistical distribution theory then determines the reliable P-values for making the decision. design-island runs in two phases, namely first phase and refinement phase. In the first phase, it identifies islands at different locations of the chromosome and to determine the stretches of those islands, and carries out statistical analysis using a probing window.
This leads to the identification of some ‘putative GIs’ having varying sizes and locations in the chromosome that are identifiable with P-values generated using Monte–Carlo tests carried out at variable locations of the probing window with a fixed size. Following the first phase, the refinement phase commences, which takes random samples of genomic segments excluding the regions detected in the first phase. Some of the putative GIs identified in the first phase are further
refined into smaller segments containing NVP-BKM120 supplier horizontally acquired genes in the refinement phase. design-island was implemented on the chromosomes of three completely sequenced genomes of V. cholerae under study in order to identify the putative GIs in their genomes. In the first phase, design-island was run using P0=0.05, word size of 4 and initial window size of 5000 with consequent window increment of 500. Two hundred randomly selected fragments CYC202 nmr were tested for each window with a sliding window 500. In the refinement phase or the second phase design-island was run with the same parameter values as used in the first phase, except for the initial window size, which was reduced to 2000 and the sliding window increased to 1000. The statistical analysis in the refinement phase is similar to that used in the first phase except the P0 was set to 0.001. The results thus obtained were tabulated using customized perl scripts where
the cut-off E-value was set to 0.001. The final results obtained from design-island were fed into another perl program to generate a circular map of the chromosome indicating the putative GIs PAK6 as identified by design-island in separate phases using different colors. The algorithm is described in Fig. 1. Coordinates of statistically significant genomic segments of three V. cholerae strains under study were determined by design-island from two separate phases. From these predicted regions of three V. cholerae strains the coding regions were marked out with the protein table as the reference available at the NCBI database using a customized perl script. The results show that among the three strains under study, the maximum coverage by the GIs after the refinement phase was found to be 50.90% in the case of V. cholerae MJ1236 (large chromosome) while the least coverage was 33.11%, as in case of V. cholerae El Tor N16961 (small chromosome) as evident from Table 1. design-island identified all the known GIs of V.