Estimating Species Trees from Unrooted Gene Trees

Document Type


Publication Date


Publication Title

Systematic Biology




In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJst to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJst method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJst and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJst and STAR. Unlike BEST and STAR, the NJst method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJst method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.