This is a guest post by Garrett Grolemund (mentored by Hadley Wickham)
Lubridate is an R package that makes it easier to work with dates and times. The newest release of lubridate (v 1.1.0) comes with even more tools and some significant changes over past versions. Below is a concise tour of some of the things lubridate can do for you. At the end of this post, I list some of the differences between lubridate (v 0.2.4) and lubridate (v 1.1.0). If you are an old hand at lubridate, please read this section to avoid surprises!
Lubridate was created by Garrett Grolemund and Hadley Wickham.
Parsing dates and times
Getting R to agree that your data contains the dates and times you think it does can be a bit tricky. Lubridate simplifies that. Identify the order in which the year, month, and day appears in your dates. Now arrange “y”, “m”, and “d” in the same order. This is the name of the function in lubridate that will parse your dates. For example,
library(lubridate)
ymd("20110604"); mdy("06-04-2011"); dmy("04/06/2011")
## "2011-06-04 UTC"
## "2011-06-04 UTC"
## "2011-06-04 UTC"
Parsing functions automatically handle a wide variety of formats and separators, which simplifies the parsing process.
If your date includes time information, add h, m, and/or s to the name of the function. ymd_hms() is probably the most common date time format. To read the dates in with a certain time zone, supply the official name of that time zone in the tz argument.
arrive <- ymd_hms("2011-06-04 12:00:00", tz = "Pacific/Auckland")
## "2011-06-04 12:00:00 NZST"
leave <- ymd_hms("2011-08-10 14:00:00", tz = "Pacific/Auckland")
## "2011-08-10 14:00:00 NZST"
Setting and Extracting information
Extract information from date times with the functions second(), minute(), hour(), day(), wday(), yday(), week(), month(), year(), and tz(). You can also use each of these to set (i.e, change) the given information. Notice that this will alter the date time. wday() and month() have an optional label argument, which replaces their numeric output with the name of the weekday or month.
second(arrive)
## 0
second(arrive) <- 25
arrive
## "2011-06-04 12:00:25 NZST"
second(arrive) <- 0
wday(arrive)
## 7
wday(arrive, label = TRUE)
## Sat
Time Zones
There are two very useful things to do with dates and time zones. First, display the same moment in a different time zone. Second, create a new moment by combining a given clock time with a new time zone. These are accomplished by with_tz() and force_tz().
For example, I spent last summer researching in Auckland, New Zealand. I arranged to meet with my advisor, Hadley, over skype at 9:00 in the morning Auckland time. What time was that for Hadley who was back in Houston, TX?
meeting <- ymd_hms("2011-07-01 09:00:00", tz = "Pacific/Auckland")
## "2011-07-01 09:00:00 NZST"
with_tz(meeting, "America/Chicago")
## "2011-06-30 16:00:00 CDT"
So the meetings occurred at 4:00 Hadley’s time (and the day before no less). Of course, this was the same actual moment of time as 9:00 in New Zealand. It just appears to be a different day due to the curvature of the Earth.
What if Hadley made a mistake and signed on at 9:00 his time? What time would it then be my time?
mistake <- force_tz(meeting, "America/Chicago")
## "2011-07-01 09:00:00 CDT"
with_tz(mistake, "Pacific/Auckland")
## "2011-07-02 02:00:00 NZST"
His call would arrive at 2:00 am my time! Luckily he never did that.
Time Intervals
You can save an interval of time as an Interval class object. This is quite useful! For example, my stay in Auckland lasted from June 4 to August 10 (which we’ve already saved as arrive and leave). We can create this interval in one of two ways:
auckland <- interval(arrive, leave)
## 2011-06-04 12:00:00 NZST--2011-08-10 14:00:00 NZST
auckland <- arrive %--% leave
## 2011-06-04 12:00:00 NZST--2011-08-10 14:00:00 NZST
My mentor at the University of Auckland, Chris, traveled to various conferences that year including the Joint Statistical Meetings (JSM). This took him out of the country from July 20 until the end of August.
jsm <- interval(ymd(20110720, tz = "Pacific/Auckland"), ymd(20110831, tz = "Pacific/Auckland"))
Will my visit overlap with and his travels? Yes.
int_overlaps(jsm, auckland) ## TRUE
Then I better make hay while the sun shines! For what part of my visit will Chris be there?
setdiff(auckland, jsm) ## 2011-06-04 12:00:00 NZST–2011-07-20 NZST
Other functions that work with intervals include int_start, int_end, int_flip, int_shift, int_aligns, union, intersect, and %within%.
Arithmetic with date times
Intervals are specific time spans (because they are tied to specific dates), but lubridate also supplies two general time span classes: durations and periods. Helper functions for creating periods are named after the units of time (plural). Helper functions for creating durations follow the same format but begin with a “d” (for duration) or, if you prefer, and “e” (for exact).
minutes(2) # period
## 2 minutes
dminutes(2) # duration
## 120s (~2 minutes)
Why two classes? Because the timeline is not as reliable as the number line. The durations class will always supply mathematically precise results. A duration year will always equal 365 days. Periods, on the other hand, fluctuate the same way the timeline does to give intuitive results. This makes them useful for modelling clock times. For example, durations will be honest in the face of a leap year, but periods may return what you want:
leap_year(2011)
## FALSE
ymd(20110101) + dyears(1)
## "2012-01-01 UTC"
ymd(20110101) + years(1)
## "2012-01-01 UTC"
leap_year(2012)
## TRUE
ymd(20120101) + dyears(1)
## "2012-12-31 UTC"
ymd(20120101) + years(1)
## "2013-01-01 UTC"
We can use periods and durations to do basic arithmetic with date times. For example, if I wanted to set up a reoccuring weekly skype meeting with Hadley, it would occur on:
meetings <- meeting + weeks(0:5)
## [1] "2011-07-01 09:00:00 NZST" "2011-07-08 09:00:00 NZST"
## [3] "2011-07-15 09:00:00 NZST" "2011-07-22 09:00:00 NZST"
## [5] "2011-07-29 09:00:00 NZST" "2011-08-05 09:00:00 NZST"
Hadley travelled to conferences at the same time as Chris. Which of these meetings would be affected? The last two.
meetings %within% jsm
## FALSE FALSE FALSE FALSE TRUE TRUE
How long was my stay in Auckland?
auckland / edays(1)
## 67.08333
auckland / edays(2)
## 33.54167
auckland / eminutes(1)
## 96600
And so on. Alternatively, we can do modulo and integer division. Sometimes this is the more sensible than division - it is not obvious how to express a remainder as a fraction of a month because the length of a month constantly changes.
auckland %/% months(1)
## 2
auckland %% months(1)
## 2011-08-04 12:00:00 NZST--2011-08-10 14:00:00 NZST
Modulo with an interval returns the remainder as a new (smaller) interval. We can turn this or any interval into a generalized time span with as.period().
as.period(auckland %% months(1))
## 6 days and 2 hours
as.period(auckland)
## 2 months, 6 days and 2 hours
Vectorization
The code in lubridate is vectorized and ready to be used in both interactive settings and within functions. As an example, I offer a function for advancing a date to the last day of the month
last_day <- function(date) {
ceiling_date(date, "month") - days(1)
}
# try last_day(ymd(20000101) + months(0:11))
Changes in lubridate (v 1.1.0)
To comply with changes in base R, We’ve moved lubridate from an S3 class system to an S4 system. Most of these changes are contained “under the hood.” But some changes will affect how lubridate behaves and compromise previously written code. This disruption is unavoidable: previous versions of lubridate will not work on future versions of R. To see a complete list of changes, see the lubridate news file here.
Changes between version 0.2.6 and 1.1.0 can be grouped into three main categories:
Subtracting two dates must now follow the default behavior of R, which is to create a difftime object. Previous versions of lubridate created an Interval object. If you receive an error involving an Interval object, please check that it is not in fact a difftime.
Intervals now have a direction. This means the order of the dates in new_interval(), interval() and %--% matters. You can flip the direction with int_flip and coerce all intervals to be postive with int_standardize
Lubridate no longer automatically coerces time span input to the correct class when performing math. This was poor programming because the user can and should explicitly control the class of their input with as.interval(), as.period(), and as.duration(). It also made lubridate needlessly chatty with messages and led to unintuitive results as we added increased functionality. When coercion is needed to perform an operation, lubridate now returns an informative error message.
Further Resources
To learn more about lubridate, including the specifics of periods and durations, please read the original lubridate paper at http://www.jstatsoft.org/v40/
My bigger use for with_tz() and force_tz() is handling daylight savings times, which are flagged as timezones. I have instrumentation data in standard time and in local time (automatically switching back & forth to DST).
with_tz() is an easy tool to create gmt (no DST) representations of all instants for merging data frames. [Match-merging on dates with 2 sets of times between 1am and 2am is a bad idea, and R on MSwindows doesn’t define a PST timezone.]
I did not go through all your post, but your last_day function may have a small bug unless this is what you want:
last_day(as.Date(‘2014-1-1’)): “2013-12-31”
last_day(as.Date(‘2014-1-2’)): “2014-01-31”
A working version might be something like:
last_day <- function(date) {
ceiling_date(date+days(1), "month") – days(1)
}
How would you handle 2 digit years – “79” for “1979”?
ymd_hm(“79-08-03 10:15”)
gives:
[1] “2079-08-03 10:15:00 UTC”
I do not know. Might be worth asking on stackoverflow.
Try ymd_hm(“1979-08-03 10:15”), two digit year might refer to the current century.
NOw it seems to work well
Hello.
When I run ymd(“1989-05-17”) I get “1989-05-17 UTC”
What option would let me to get “1989-05-17” without the “UTC” part?
I don’t want to cut the string but a lubridate’s option.
HI!
your package is not available for R version 3.4.2. Can you fix it? Thx.
Hi!
I have a problem when i want the function year()….
it gives me back the days instead of the years. I don’t understand…
I am using a version 3.4.0 from Rstudio
Please ask on stackoverflow.com
okey, thank you