In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote:
Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.
So I did, and here’s the output:
> letters[c(12,NA)]
[1] "l" NA
> letters[c(NA,NA)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>
Interesting isn’t it?
I had no clue why this had happened but luckily for us, Barry gave a follow-up reply with an explanation. And here is what he wrote:
My example with ‘letters’ comes from a collision of three features:
- recycling of short subscripts
- silent coercion of types (boolean NA to numeric NA)
- and the existence of five different NA values that all print the same.
[…] to really understand that letters[c(1,NA)] is different from letters[c(NA,NA)] you have to see that:
- in the first case, the NA is coerced to a numeric NA because it’s in a vector with a numeric ‘1’.
- in the first case, you are selecting elements by supplying a vector of indexes
- in the second case, your NAs are boolean (logical) NA values
- hence your subscript is a logical vector
- logical vectors are recycled
- now your subscript is a vector of TRUE/FALSE values (which are all NA) of the same length as ‘letters’.
To make sure I understood Barry correctly, I tried the following code:
> letters[c(T,NA)]
[1] "a" NA "c" NA "e" NA "g" NA "i" NA "k" NA "m" NA "o" NA "q" NA "s" NA "u" NA "w" NA "y" NA