|
| 1 | +--- |
| 2 | +title: "ML trees with bootstraps" |
| 3 | +objectives: |
| 4 | +- Creating maximum likelihood trees with RAxML |
| 5 | +- Creating bootstrapped trees with RAxML |
| 6 | +- Creating bipartition tree (ML + bootstrap) |
| 7 | +--- |
| 8 | + |
| 9 | +We showed how to generate a simple tree in the lesson "Phylogenetic tree". |
| 10 | +However, for publication, you are expected to show more evidence, |
| 11 | +such as the stability of tree topology. |
| 12 | +One way to show this is using bootstrap values, |
| 13 | +which is the probability that a particular node |
| 14 | +(i.e. a dichotomous branching with a particular set of samples in each branch) |
| 15 | +appearing among a large number of trees generated |
| 16 | +while resampling within the sequence alignment. |
| 17 | +Bootstrap values are often used in the same fashion as confidence intervals. |
| 18 | + |
| 19 | +A common method for creating bootstrapped trees using RAxML consists of |
| 20 | +3 step approach: |
| 21 | +- Generate a few maximum likelihood (ML) trees. |
| 22 | +- Generate many bootstrapped trees. |
| 23 | +- Apply the bootstrap information to the best ML tree. |
| 24 | + |
| 25 | +### Generating an ML tree |
| 26 | + |
| 27 | +Maximum likelihood tree generation is computationally expensive, |
| 28 | +but the resulting tree is considered superior to other rapid methods. |
| 29 | +Thus, we'll generate a small number of ML trees. |
| 30 | + |
| 31 | +~~~ |
| 32 | +$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -# 16 -s input.fasta -n treeML -w outdir |
| 33 | +
|
| 34 | +$ ls -t outdir/*treeML* |
| 35 | +~~~ |
| 36 | +{: .language-bash} |
| 37 | + |
| 38 | +> Output directory specified by `-w` must be an absolute path in RAxML. |
| 39 | +> `$(pwd)/outdir` may be used if **outdir** is a relative path. |
| 40 | +> This directory needs to be created prior to running the command. |
| 41 | +> Alternatively, you can use the current directory as output directory |
| 42 | +> by not including this `-w` parameter. |
| 43 | +{: .caution} |
| 44 | + |
| 45 | +~~~ |
| 46 | +RAxML_parsimonyTree.treeML.RUN.0 RAxML_parsimonyTree.treeML.RUN.1 |
| 47 | +RAxML_log.treeML.RUN.0 RAxML_parsimonyTree.treeML.RUN.2 |
| 48 | +RAxML_log.treeML.RUN.1 RAxML_parsimonyTree.treeML.RUN.3 |
| 49 | +... |
| 50 | +... |
| 51 | +RAxML_result.treeML.RUN.0 RAxML_result.treeML.RUN.1 |
| 52 | +RAxML_result.treeML.RUN.2 RAxML_result.treeML.RUN.3 |
| 53 | +... |
| 54 | +... |
| 55 | +RAxML_info.treeML RAxML_bestTree.treeML |
| 56 | +~~~ |
| 57 | +{: .output} |
| 58 | + |
| 59 | +RAxML will automatically select the best tree among the outputs |
| 60 | +and store it in the file **RAxML_bestTree.xxx**. |
| 61 | + |
| 62 | +In the command above, |
| 63 | +- `-T` specifies number of CPU threads to be used. |
| 64 | +- `-m` specified the substitution model. |
| 65 | +- `-p` specifies random seed for starting parsimony tree. |
| 66 | +- `-#` specifies the number of trees to generate using unique starting tree. |
| 67 | +- `-s` specifies input file containing sequence alignment. |
| 68 | +- `-n` specifies suffix for output files. |
| 69 | +- `-w` specifies output directory. |
| 70 | + |
| 71 | +### Generating bootstraps |
| 72 | + |
| 73 | +Next, we can generate a large number of computationally permissive trees |
| 74 | +for calculating bootstrap values. |
| 75 | + |
| 76 | +~~~ |
| 77 | +$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -b 144 -# 1000 -s input.fasta -n treeML -w outdir |
| 78 | +
|
| 79 | +$ ls outdir/*treeBS* |
| 80 | +~~~ |
| 81 | +{: .language-bash} |
| 82 | + |
| 83 | +~~~ |
| 84 | +RAxML_info.treeBS RAxML_bootstrap.treeBS |
| 85 | +~~~ |
| 86 | +{: .output} |
| 87 | + |
| 88 | +In the command above, `-b` specifies bootstrapping with supplied random seed, |
| 89 | +and `-#` specifies the number of bootstraps. |
| 90 | + |
| 91 | +> A newer rapid bootstrap method |
| 92 | +> [↗](https://doi.org/10.1080/10635150802429642){:target="_blank"} |
| 93 | +> can be employed in place of standard bootstraping |
| 94 | +> [↗](https://doi.org/10.1111/j.1558-5646.1985.tb00420.x){:target="_blank"} |
| 95 | +> by using the argument `-x` instead of `-b`. |
| 96 | +{: .tips} |
| 97 | + |
| 98 | +RAxML can also perform posterior bootstrap convergence analysis to determine |
| 99 | +if the number of bootstraps is adequate. |
| 100 | + |
| 101 | +~~~ |
| 102 | +$ raxmlHPC -m GTRGAMMA -p144 -z outdir/RAxML_bootstrap.treeBS -I autoMRE -n BStest -w outdir |
| 103 | +
|
| 104 | +$ tail -n1 outdir/RAxML_info.BStest |
| 105 | +~~~ |
| 106 | +{: .language-bash} |
| 107 | + |
| 108 | +~~~ |
| 109 | +Converged after 900 replicates |
| 110 | +~~~ |
| 111 | +{: .output} |
| 112 | + |
| 113 | +In the command above, `-I` initiates convergence testing and |
| 114 | +specifies which criterion to use for the test. |
| 115 | +`-z` specifies input bootstrap tree file to test. |
| 116 | + |
| 117 | +### Applying bootstrap values to the best ML tree |
| 118 | + |
| 119 | +The final step is to apply the bootstrap values to the best ML tree. |
| 120 | + |
| 121 | +~~~ |
| 122 | +$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -f b -t outdir/RAxML_bestTree.treeML -z outdir/RAxML_bootstrap.treeBS -n treeBP -w outdir |
| 123 | +
|
| 124 | +$ ls outdir/*treeBP* |
| 125 | +~~~ |
| 126 | +{: .language-bash} |
| 127 | + |
| 128 | +~~~ |
| 129 | +RAxML_bipartitionsBranchLabels.treeBP RAxML_bipartitions.treeBP |
| 130 | +~~~ |
| 131 | +{: .output} |
| 132 | + |
| 133 | +In the command above, `-f b` instructs creation of bipartition tree from |
| 134 | +best ML tree (supplied with `-t`) and |
| 135 | +the bootstrap trees (specified with `-z`). |
| 136 | + |
| 137 | +The output files can be used to visualize the trees. |
| 138 | +The two output files have similar information except |
| 139 | +the branch support information is supplied in a |
| 140 | +slightly different format (node label vs branch label). |
| 141 | +Select the file that is correctly interpreted by your visualization |
| 142 | +program. |
| 143 | + |
| 144 | +### A single-step approach |
| 145 | + |
| 146 | +RAxML can perform all three steps above with a single line of code. |
| 147 | +However, only the newer rapid approach can be used for bootstraping. |
| 148 | +By default, 20 ML trees are generated. |
| 149 | + |
| 150 | +~~~ |
| 151 | +$ raxmlHPC -T 8 -m GTRGAMMA -p 144 -f a -x 144 -# 1000 -s input.fasta -n treeALL -w outdir |
| 152 | +
|
| 153 | +$ ls outdir/*treeALL* |
| 154 | +~~~ |
| 155 | +{: .language-bash} |
| 156 | + |
| 157 | +~~~ |
| 158 | +RAxML_bestTree.treeALL RAxML_bootstrap.treeALL |
| 159 | +RAxML_bipartitionsBranchLabels.treeALL RAxML_info.treeALL |
| 160 | +RAxML_bipartitions.treeALL |
| 161 | +~~~ |
| 162 | +{: .language-bash} |
| 163 | + |
| 164 | +> ## RAxML resources |
| 165 | +> - [RAxML v8 manual](https://cme.h-its.org/exelixis/resource/download/NewManual.pdf){:target="_blank"} |
| 166 | +> - [A more extensive RAxML guide by the authors](https://cme.h-its.org/exelixis/web/software/raxml/hands_on.html){:target="_blank"} |
| 167 | +> - [ExaML - parallelized approach for whole genomes datasets](https://github.com/stamatak/ExaML){:target="_blank"} |
| 168 | +{: .notes} |
0 commit comments