The recent release of R 3.2.2 came with a small (but highly valuable) improvement to the stats:::labels.dendrogram function. When working with dendrograms with (say) 1000 labels, the new function offers a 70 times speed improvement over the version of the function from R 3.2.1. This speedup is even better than the Rcpp version of labels.dendrogram from the dendextendRcpp package.
Here is some R code to demonstrate this speed improvement:
# IF you are missing an of these - they should be installed:
install.packages("dendextend")
install.packages("dendextendRcpp")
install.packages("microbenchmark")
# Getting labels from dendextendRcpp
labelsRcpp% dist %>% hclust %>% as.dendrogram
labels(dend)
And here are the results:
> microbenchmark(labels_3.2.1(dend), labels_3.2.2(dend), labelsRcpp(dend))
Unit: milliseconds
expr min lq median uq max neval
labels_3.2.1(dend) 186.522968 189.395378 195.684164 208.328365 321.98368 100
labels_3.2.2(dend) 2.604766 2.826776 2.891728 3.006792 21.24127 100
labelsRcpp(dend) 3.825401 3.946904 3.999817 4.179552 11.22088 100
>
> microbenchmark(labels_3.2.2(dend), order.dendrogram(dend))
Unit: microseconds
expr min lq median uq max neval
labels_3.2.2(dend) 2520.218 2596.0880 2678.677 2885.2890 9572.460 100
order.dendrogram(dend) 665.191 712.2235 954.951 996.1055 2268.812 100
As we can see, the new labels function (in R 3.2.2) is about 70 times faster than the older version (from R 3.2.1). When only wanting something like the number of labels, using length on order.dendrogram will still be (about 3 times) faster than using labels.
This improvement is expected to speedup various functions in the dendextend R package (a package for visualizing, adjusting, and comparing dendrograms, which heavily relies on labels.dendrogram). We expect to get even better speedup improvements for larger trees.