The difference between "letters[c(1,NA)]" and "letters[c(NA,NA)]"

Tal Galili

13 years ago

In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote:

Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.

So I did, and here’s the output:

> letters[c(12,NA)]
[1] "l" NA
>  letters[c(NA,NA)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>

Interesting isn’t it?
I had no clue why this had happened but luckily for us, Barry gave a follow-up reply with an explanation. And here is what he wrote:

My example with ‘letters’ comes from a collision of three features:

recycling of short subscripts
silent coercion of types (boolean NA to numeric NA)
and the existence of five different NA values that all print the same.

[…] to really understand that letters[c(1,NA)] is different from letters[c(NA,NA)] you have to see that:

in the first case, the NA is coerced to a numeric NA because it’s in a vector with a numeric ‘1’.
in the first case, you are selecting elements by supplying a vector of indexes
in the second case, your NAs are boolean (logical) NA values
hence your subscript is a logical vector
logical vectors are recycled
now your subscript is a vector of TRUE/FALSE values (which are all NA) of the same length as ‘letters’.

To make sure I understood Barry correctly, I tried the following code:

>  letters[c(T,NA)]
 [1] "a" NA  "c" NA  "e" NA  "g" NA  "i" NA  "k" NA  "m" NA  "o" NA  "q" NA  "s" NA  "u" NA  "w" NA  "y" NA

Barry gave this example to illustrate how R violates the Zen idea if: “Simple is better than complex”. Since (so he claims), subscript recycling is shooting you in the foot.

To follow up on that discussion, head over to Barry’s comment on the REvolution blog