Chapter 13: Working with SAS Dates
SAS can read and write dates in almost any format, such as 5/23/2015 or 23May2015. No matter how the date appears in the input file, SAS converts all dates to the number of days from January 1, 1960. Thus, January 1, 1960 is 0; January 2, 1960 is a 1; and so on. Dates before January 1, 1960 are converted to negative numbers. For example, December 31, 1959 is equal to -1. Almost all computer languages store dates as the number of days from a fixed date. SAS chose January 1, 1960—other languages use other dates. Although SAS dates are stored internally as numbers, SAS can display the date in any of the standard formats. For those readers interested in history, SAS does not compute dates before January 1, 1582. That is because Pope Gregory the VIII decreed that October 4, 1582 would be followed by October 15, 1582 (the beginning of the Gregorian calendar).
Remember INFORMATS help read data into SAS and change how the data is stored in the SAS data set. Formats do not change how the data is stored and are used to change how the data is displayed.
SAS uses informats to convert dates expressed in any one of the possible date formats into its internal representation of dates (the number of days from January 1, 1960). Here is an example.
You have a text file Date_Data.txt containing three dates, as follows:
File 13.1: Date_Data.txt
123456789012345678901234567890 (Ruler, not part of the file)
05/23/2015 23May2015 23/05/2015
10/21/1950 21Oct1950 21/10/1950
5/7/2013 7Jul2013 7/5/2013
The first date, starting in column 1, is in the form mm/dd/yyyy (month, day, year). The second date, starting in column 12, is in the form ddMonyyyy (day of the month, month abbreviation, year). The third date, starting in column 22, is in the form dd/mm/yyyy (day, month, year). The following program reads each of these dates:
Program 13.1: Reading Dates in a Variety of Date Formats
data Read_Dates;
infile “~/MyBookFiles/Date_Data.txt” pad;
input @1 Date1 mmddyy10.
@12 Date2 date9.
@22 Date3 ddmmyy10.;
run;
title “Listing of Data Set Dates”;
proc print data=Read_Dates noobs;
run;
You use the appropriate informat to read each of the dates. The two informats mmddyy10. and ddmmyy10. are pretty obvious. By the way, the 10 at the end specifies that the program is reading 10 columns of data. If you had a date such as 5/6/2013 anywhere in the specified 10 columns, SAS would still read it correctly. The informat DATE9. reads dates in this form: day of the month (one or two digits), month abbreviation (not case-sensitive), and four-digit year.
Here is the listing.
Figure 13.1: Output from Program 13.1
This looks pretty strange. What you are seeing are the internal values for each of the dates—the number of days from January 1, 1960. To see these values as dates, you need to associate a SAS date format with each variable. Here is a list of some of the more popular date formats:
Table 13.1: List of Popular Formats for Displaying Dates
How Date is Displayed |
Date Format |
10/21/1950 |
mmddyy10. |
10-21-1950 |
mmddyyd10. |
10 21 1950 |
mmddyyb10. |
10:21:1950 |
mmddyyc10. |
21/10/1950 |
ddmmyy10. |
21Oct1950 |
DATE9. |
Saturday, October 21, 1950 |
WEEKDATE. |
The formats that end in 10d, 10b, and 10c (rows 3-5) replace the default slash with hyphens, blanks, or colons, respectively. You can add d’s, b’s, or c’s to the ddmmyy formats as well.
You can use any date format you want to display any of these date variables. The program below is identical to Program 13.1Program 13.1: Reading Dates in a Variety of Date Formats, except that each of the dates is now formatted.
Program 13.2: Adding Formats to Display the Date Values
data Read_Dates;
infile “~/MyBookFiles/Date_Data.txt” pad;
input @1 Date1 mmddyy10.
@12 Date2 date9.
@22 Date3 ddmmyy10.;
format Date1 mmddyy10. Date2 Date3 date9.;
run;
title “Listing of Data Set Dates”;
proc print data=Read_Dates noobs;
run;
The listing below shows the effect of these formats.
Figure 13.2: Output from Program 13.2
The date values are now displayed properly. Because the month-day-year format is popular in the United States and the day-month-year format is popular in most of the world outside the United States, many companies that do global business prefer the DATE9. format (used for Date2 and Date3 in the listing).
Creating a SAS Date from Month, Day, and Year Values
If you have month, day, and year values as separate variables, you can use the MDY function to create a SAS date. This function takes three numeric arguments and, as the function name suggests, the three arguments provide values for month, day, and year.
Here is an example: You have a SAS data set with variables Month, Day, and Year. To compute a SAS date, use the following statement:
Date = MDY(Month, Day, Year);
The variable Date will be a SAS date (i.e., the number of days from January 1, 1960). You will most likely want to add a FORMAT statement to associate one of the date formats with the variable Date.
If you need to refer to a specific date in a DATA step, you could always enter that date as the number of days from January 1, 1960. However, computing that value is inconvenient and a program that referred a date by its internal value would also be difficult to read. The answer: The date constant (also known as a date literal).
Suppose you want to test a date to see whether it is earlier than January 1, 2020, or later than December 31, 2021. Here’s how to do it:
Program 13.3: Demonstrating a Date Constant
data _null_;
title “Checking for Out of Range Dates”;
input @1 Date mmddyy10.;
file print;
if Date lt ‘01Jan2020’d and not missing(Date) or
Date gt ‘31Dec2021’d then put “Date “ Date “is out of range”;
format Date mmddyy10.;
datalines;
10/13/2020
5/1/2012
1/1/2015
6/5/2020
1/1/2000
;
As demonstrated in this program, a date constant consists of a date in the form ddMonyyyy, where dd is the day of the month, Mon is a three-character month abbreviation, and yyyy is the year. You place this value in single or double quotation marks and follow it (no spaces) with an uppercase or lowercase d. At compile time, SAS converts the date constants to SAS dates.
Before we get to the output, let’s discuss a few features of the program. First of all, it uses the reserved data set name _NULL_. Data _NULL_ was discussed earlier in this book, but in case you didn’t read the chapters of this book in order, here is the explanation again.
Giving a data set this special name has the effect of not creating any data set at all. That is, after the DATA step runs, there is no data set left behind. Using a _NULL_ statement saves the computer all the overhead of creating the data set and writing observations to it. Because of this, you can’t use a PROC PRINT (or any other procedure) to list the contents of the data set, because there is no data set. The solution is to use a PUT statement. Following the keyword PUT, you can enter text (in quotation marks) and variable names. SAS will print out the quoted text and the value of the variables listed in the PUT statement. Finally, the statement FILE PRINT is an instruction to print the results to the RESULTS window. If you leave off this statement, SAS, by default, sends the results of the PUT statement to the SAS log.
As a reminder, a missing value (logically the most negative value that you can have) is going to be less than any date. By using the MISSING function in this program, you are giving instructions not to print the out-of-range message if the date is a missing value.
Here is the output from Program 13.3.
Figure 13.3: Output from Program 13.3
The program worked as advertised.
Extracting the Day of the Week, Day of the Month, Month, and Year from a SAS Date
You can use the four functions, WEEKDAY, DAY, MONTH and YEAR, to compute the day of the week (a number from 1 to 7, with 1=Sunday), day of the month (a number from 1 to 31), month of the year (a number from 1 to 12), and year from a SAS date. Let’s look at an example.
You have a SAS data set with a variable called Date. You would like to generate bar charts showing frequencies for day of the week, day of the month, and year.
To save room, this program only produces a bar chart for the day of the week. You can substitute the day of the month or year to obtain bar charts for these variables. The DATA step that creates the data set containing the Date variable is included so that you can try running the program yourself.
Program 13.4: Extracting the Day of the Week, Day of the Month, and Year from a SAS Date
data Extract;
informat Date mmddyy10.;
input Date @@;
Day_of_Week = weekday(Date);
Day_of_Month = day(Date);
Year = year(Date);
format Date mmddyyd10.;
datalines;
1/5/2000 2/8/2000 4/23/2000 4/12/2000 8/21/2000 8/21/2000 8/22/2000
12/12/2000 12/15/2000 12/18/2000
2/22/2001 2/1/2001 4/18/2001 4/18/2001 4/18/2001 9/17/2001 12/25/2001
12/22/2001 3/3/2001 3/6/2001 3/7/2001
;
title “Listing of the First Eight Observations from Extract”;
proc print data=Extract (obs=8);
run;
title “Frequencies for Day of the Week”;
proc sgplot data=Extract;
vbar Day_of_Week;
run;
The two @ signs at the end of the INPUT statement prevent SAS from going to a new line when the DATA step iterates. It enables you to place data for more than one observation on a single line of data.
The three functions, WEEKDAY, DAY, and YEAR, are used to extract the day of the week, the day of the month, and the year from the variable Date.
The data set option OBS= is an instruction to stop processing when you reach observation 8. It is a convenient way to list the first n observations in a data set.
You use PROC SGPLOT to generate a vertical bar chart (VBAR) for the variable Day_of_Week.
Here is the output from Program 13.4.
Figure 13.4: Output from Program 13.4
The first eight observations are printed. Notice the date format (it uses hyphens because of the mmddyyd10. format) and the values for the other three variables.
The bar chart shows frequencies for the days of the week.
Adding a Format to the Bar Chart
To make the bar chart more readable, you can write a format that will substitute the day abbreviations (Mon, Tue, etc.) for the numbers 1–7. The program that follows does just that:
Program 13.5: Creating a Variable Representing Day of the Week Abbreviations
proc format;
value DOW 1=’Sun’ 2=’Mon’ 3=’Wed’ 4=’Thu’
5=’Fri’ 6=’Sat’ 7=’Sun’;
run;
data Extract;
informat Date mmddyy10.;
input Date @@;
Day_of_Week = weekday(Date);
Day_of_Month = day(Date);
Year = year(Date);
format Date mmddyyd10. Day_of_Week DOW.;
datalines;
1/5/2000 2/8/2000 4/23/2000 4/12/2000 8/21/2000 8/21/2000 8/22/2000
12/12/2000 12/15/2000 12/18/2000
2/22/2001 2/1/2001 4/18/2001 4/18/2001 4/18/2001 9/17/2001 12/25/2001
12/22/2001 3/3/2001 3/6/2001 3/7/2001
;
title “Listing of the First Eight Observations from EXTRACT”;
proc print data=Extract (obs=8);
run;
title “Frequencies for Day of the Week”;
proc sgplot data=Extract;
vbar Day_of_Week;
run;
The bar chart now looks like this.
Figure 13.5: Output from Program 13.5
This is a substantial improvement over the previous bar chart.
Computing Age from Date of Birth: The YRDIF Function
Computing a person’s age, given the date of birth, is a problem faced by just about every programmer. Luckily, SAS has a function called YRDIF (year difference) that computes the difference between two SAS dates, in years. For example, suppose you have a variable called DOB (date of birth) and you want to compute a person’s age as of January 1, 2020. The calculation is:
Age = yrdif(DOB,’01Jan2020’d);
The two arguments to the YRDIF function are the first and last dates from which you want to compute the interval. In this example, the first date is a SAS date and the second date is a date constant. There are many applications where you need to compute age as of the last birthday. For example, you cannot vote even if your 18th birthday is the day after Election Day. You can use the INT (integer) function to extract the integer part of a number. The expression to compute a person’s age on January 1, 2020, as of his or her last birthday is:
Age_Last = int(yrdif(DOB,’01Jan2020’d));
Notice that it is OK to have a SAS function as an argument to another SAS function. Just be careful with parentheses when you do this. You might want to draw the line at two functions in a single SAS statement. More than this can be a bit confusing to read.
If you need to round the result of the YRDIF function, use the ROUND function in place of the INT function.
SAS stores dates as the number of days from January 1, 1960. SAS has informats to read and interpret dates in almost any form. Once you have a SAS date, you can use the WEEKDAY, DAY, MONTH, and YEAR functions to extract any of these values from the date. To learn more about SAS date functions, please refer to one (or both) of the following references:
Cody, Ron. 2010. SAS Functions by Example, Second Edition. Cary, NC: SAS Institute Inc.
Morgan, Derek. 2014. The Essential Guide to SAS Dates and Times, Second Edition. Cary, NC: SAS Institute Inc.
1. You have a raw data file with the following data:
12345678901234567890 Ruler – not part of the data
10/21/2015 12Jun2015
12/25/2015 9Apr2014
The date starting in column 1 is in the form month/day/year. The date starting in column 12 is in the form 2-digit day of the month, 3-character month abbreviation, and 4-digit year.
Write an INPUT statement to read these two lines of data (you can use DATALINES). Call the first date Date1 and the second date Date2. Format both dates with the mmddyy10. format.
2. Run the program below to create a data set called Date_Test:
Program for Problem Sets 4
data Date_Test;
input Month Day Year;
datalines;
10 21 1988
3 4 2015
1 1 1960
;
Modify this program so that you have a variable called Date that is a SAS date. Format this date using the DATE9. format.
3. The SASHELP data set Retail contains the variables Month, Day, and Year. Create a new, temporary data set called Dates that has these three variables plus one other called SAS_Date that is a true SAS date. Format this variable using the mmddyy10. format and list the first five observations from this data set. The Retail data set contains other variables besides Month, Day, and Year. You do not want any of these other variables in your Dates data set.
4. Using the data from Problem 1, compute three new variables, Month, Day (day of the week), and Year based on the date starting in column 1. Compute frequencies for these variables.
5. Run the following program to create a data set called Study:
Program for Problem Sets 5
data Study;
call streaminit(13579);
do Subj = 1 to 10;
Date = ‘01Jan2015’d + int(rand(‘uniform’)*300);
output;
end;
format Dates date9.;
run;
Write a DATA step that will print out all dates in the Study data set that are before January 1, 2015 or after July 4, 2015. You can either use a DATA _NULL_ DATA step with a PUT statement or create a data set of out-of-range dates and use PROC PRINT to print it.
6. Run the program in Problem 5 except change the line that computes dates to:
Dates = ‘01Jan1950’d + int(rand(‘uniform’)*15000);
Assuming these dates represent the date of birth, compute the age of each subject as of January 1, 2015. Print out the subject numbers (variable Subj) and ages for each person. Add a FORMAT statement to your PROC PRINT to assign the format 4.1 to the variable Age (which will list each age, rounded to a tenth of a year).