Practical and Efficient SAS® Programming: The Insider’s Guide

Working with data tables is a fundamental part of SAS, which means there are many programming choices when you need to manipulate data tables. This chapter shows you some techniques that will help you work with data tables.

Copying Variable Attributes

I often find that I need to make a new table that has the same variables and attributes as an existing table, usually because I’m going to append the new table to the existing table and want the attributes to be the same. You can hardcode all the lengths, formats, informats, and labels for the variables, but this creates a maintenance headache if the original table changes. The technique I use is to add these lines at the beginning of the DATA step:

● When you run a DATA step, the program data vector (PDV) is created at compile time, before any of the code in the DATA step is actually executed. The PDV is a list of all the variables and their attributes that are used during the DATA step. The compiler gathers all these variables from the DATA step code and from the headers of any tables that are referenced in the DATA step. The compile step happens one time before executing the DATA step.

● When you use this technique, the variables and their attributes in the table on the SET statement are brought into the PDV during the compile step.

● When the DATA step is executed after the compile step, and each time it runs the IF statement, it considers the condition to be false (because 0=false in SAS), so it doesn’t execute the statement, or statements, after the THEN. This means that no data is ever read from that table.

● So the variables and their attributes are brought into the PDV by the compiler, but records in the table are never read.

In the example in Program 2.1, I want to create a new table with the same variables and attributes as the sashelp.class table.

data newClass;

if (0) then

set sashelp.class;

input name sex age height weight;

datalines;

Susan F 10 50 75

Chris M 10 53 80

;

After running the code in Program 2.1, the variables in the newClass table are the same as the ones in the sashelp.class table, as you can see in Figure 2.1 and Figure 2.2.

Be sure to put the IF statement close to the beginning of your DATA step; the variables are added to the PDV in the order that the compiler finds them.

Variable lengths can sometimes cause issues when you use this technique. If you use a LENGTH statement before the IF (0) statement, and you specify the length of a variable to be shorter than the variable in the table, then you will get warnings in your SAS log:

WARNING: Multiple lengths were specified for the variable name by input data set(s). This can cause truncation of data.

If your code is working the way you want, you can temporarily turn off these warnings in your logs with the VARLENCHK= option. You should then turn the warnings back on after the DATA step. I don’t recommend setting the option at the beginning of your code, because there are times that the warning indicates an issue that should be fixed. But if you are sure that the warning is not a problem, you can set the option to NOWARN before the DATA step and then back to WARN after the DATA step as in the example in Program 2.2 and SAS Log 2.1.

options varlenchk = NOWARN;

data newClass;

length name $4;

if (0) then

set sashelp.class;

input name sex age height weight;

datalines;

Susan F 10 50 75

Chris M 10 53 80

;

options varlenchk = WARN;

1 options varlenchk = NOWARN;

2 data newClass;

3 length name $4;

4 if (0) then

5 set sashelp.class;

6 input name sex age height weight;

7 datalines;

NOTE: The data set WORK.NEWCLASS has 2 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

8 ;

9 options varlenchk = WARN;

Reading Data with a SET Statement

Reading an existing data table into your DATA step is a simple process using the SET statement. You can also use the SET statement to read multiple tables and to read the data out of sequence.

Concatenating Tables

If you want to add multiple tables together, one after another, you can simply list all the tables on a single SET statement. This causes the DATA step to read all the records from the first table, then all the records from the second table, and so on until it has read all the records from all the tables. The tables don’t even have to have the same variables, though usually it is more useful if they do.

data class_2016;

set sashelp.class (obs = 5);

run;

data class_2017;

set sashelp.class (keep = name sex age

obs = 6);

age + 1;

grade = 100;

run;

You can use a SET statement to concatenate these two tables, even though they don’t have the same variables. The example in Program 2.4 shows how to concatenate the two class tables, and Figure 2.5 shows a listing of the output table. I also use the IN= option to help create a year variable to indicate which table the record came from.

data class;

set class_2016 (in = in_2016)

class_2017 (in = in_2017);

if (in_2016) then

year = 2016;

else if (in_2017) then

year = 2017;

run;

The Height and Weight variables are only in the class_2016 table, so the values are missing on the 2017 records. The grade variable is only in the class_2017 table, so the values are missing on the 2016 records.

Interleaving Tables

If you have multiple tables that are sorted by the same variables, then you can interleave the tables using a SET statement with a BY statement. The records are read in sorted order from all the input tables. Program 2.5 and SAS Log 2.2 show how to interleave a table containing the males in sashelp.class with the females from sashelp.class.

proc sort data = sashelp.class

out = male;

where sex = "M";

by name;

run;

proc sort data = sashelp.class

out = female;

where sex = "F";

by name;

run;

data _null_;

set male

female;

by name;

putlog name= sex=;

run;

Name=Alfred Sex=M

Name=Alice Sex=F

Name=Barbara Sex=F

Name=Carol Sex=F

Name=Henry Sex=M

Name=James Sex=M

Name=Jane Sex=F

Name=Janet Sex=F

...

You can see that the records are read in sorted order from both tables, rather than all the records being read from the first table and then all the records from the second table.

Using Multiple SET Statements

You can use more than one SET statement in a single DATA step. If you do this, the DATA step reads a record from the data table each time the SET statement is executed. So if you have two SET statements one after another in the DATA step, it reads a record from the table on the first SET statement and then a record from the table on the second SET statement in the same iteration of the DATA step. This means that you must be careful about the variables in each table so that values from a second table don’t overwrite those from the first table. The other thing to watch out for is the end of file. The DATA step stops iterating when it reaches the end of any file that is being read, so whichever table has the smallest number of records causes the DATA step to stop iterating before all the records in the other tables have been read.

Using NOBS= and POINT=

When I use multiple SET statements, I usually use the NOBS= and POINT= options on at least one of the SET statements.

● The NOBS= option creates a variable that contains the number of records in the incoming table.

● The POINT= option points to a variable that you create that contains the record number that is read from the table.

So if you are reading one table, and you want to look up values in a second table, you can use the POINT= and NOBS= options to let you loop through the second table. Using this technique means that you don’t hit the end-of-file condition that stops your DATA step iteration. Even if you read the last record, it doesn’t trigger the end-of-file flag.

Program 2.6 and SAS Log 2.3 are an example of using the POINT= and NOBS= options for looking up values in a second table. Using the sashelp.class table and 10 records from the sashelp.classfit table as a lookup table, you can look up the predicted weight for each person.

data classfit (keep = name predict);

set sashelp.classfit (obs = 10);

run;

data _null_;

set sashelp.class;

do p = 1 to nobs; ❶

set classfit (rename = (name = name_fit)) ❷

point = p nobs = nobs;

if (name eq name_fit) then

do;

putlog name= weight= predict= 5.1;

leave; ❸

end;

run;

❶ The p variable is used in the POINT= option and is incremented so that each record of the classfit table is read. The nobs variable is automatically set to the number of records in the classfit table by the NOBS= option.

❷ It is necessary to rename the name variable when reading the classfit table, because the name value from sashelp.classfit overwrites the name value from sashelp.class.

❸ The LEAVE statement causes the execution to exit the DO loop when it finds the match so that you don’t read the entire lookup table for each record in the sashelp.class table. If you are looking for multiple matches, then you would not use the LEAVE statement.

Name=Alice Weight=84 predict=77.3

Name=Carol Weight=102.5 predict=101.8

Name=James Weight=83 predict=80.4

Name=Jane Weight=84.5 predict=90.1

Name=Janet Weight=112.5 predict=100.7

Name=Jeffrey Weight=84 predict=100.7

Name=John Weight=99.5 predict=87.0

Name=Joyce Weight=50.5 predict=57.0

Name=Louise Weight=77 predict=76.5

Name=Thomas Weight=85 predict=81.2

I tend to use p as the variable name for the POINT= option, and nobs for the NOBS= option. These are easy to remember. If you are setting multiple tables using the POINT= and NOBS= options, then you should use a different variable for each POINT= and NOBS= option.

Using KEY=

The KEY= option enables you to read matching records from a second table using an index. The index must be set up on the second table and the variables included in the index must be set to appropriate values before executing the SET statement with the KEY= option.

Program 2.7 and SAS Log 2.4 use the previous example for POINT= and NOBS=, except this time I’ve added an index to the classfit lookup table so that I can use the KEY= option.

data classfit (index = (name)

keep = name predict);

set sashelp.classfit (obs = 10);

run;

data _null_;

set sashelp.class;

set classfit key = name;

if (_iorc_ eq 0) then ❶

putlog name= weight= predict= 5.1;

_error_ = 0; ❷

run;

❶ The automatic variable, _IORC_, is set to a return code when you use a SET statement with a KEY= option. A value of 0 indicates that the record was found.

❷ If you are using a KEY= option, you should always set the automatic variable _ERROR_ to 0 at the end of the DATA step. Otherwise all the variable values are written to the SAS log whenever _IORC_ is not 0.

Name=Alice Weight=84 predict=77.3

Name=Carol Weight=102.5 predict=101.8

Name=James Weight=83 predict=80.4

Name=Jane Weight=84.5 predict=90.1

Name=Janet Weight=112.5 predict=100.7

Name=Jeffrey Weight=84 predict=100.7

Name=John Weight=99.5 predict=87.0

Name=Joyce Weight=50.5 predict=57.0

Name=Louise Weight=77 predict=76.5

Name=Thomas Weight=85 predict=81.2

If you want to use the KEY= option to look up multiple records from the lookup table, you need to use the KEYRESET= option. This option indicates when to start at the top of the index and when to read the next value in the index. The KEYRESET= option points to a variable that you should set to 1 when you want to start at the top of the index, and set to 0 when you want to read the next value in the index. The option will very nicely set the variable to 0 each time it reads a record.

The example in Program 2.8 and SAS Log 2.5 looks up all the values associated with a given id value.

data lookup (index = (id));

input id $1. value $char10.;

datalines;

1 Pear

1 Peach

1 Banana

2 Apple

3 Grape

3 Lemon

;

data main;

input id $1.;

datalines;

;

data _null_;

set main;

reset = 1; ❶

putlog id= reset=;

do until (_iorc_ ne 0);

set lookup key = id keyreset = reset;

if (_iorc_ ne 0) then

leave;

putlog id= value= reset=;

end;

_error_ = 0;

run;

❶ You must set the reset variable to 1 to force the program to start at the top of the index. If you don’t have reset=1, the program does not find any records when it tries to look up the second id=2 value, because it has already read the one record in the lookup table for id=2.

id=1 reset=1

id=1 value=Pear reset=0

id=1 value=Peach reset=0

id=1 value=Banana reset=0

id=2 reset=1

id=2 value=Apple reset=0

id=2 reset=1

id=2 value=Apple reset=0

Determining Which Table a Record Is From

If you use a SET statement or a MERGE statement with multiple data tables, you can take advantage of the IN= data set option. This option enables you to specify a variable that is automatically set to 1 (true) when the current record comes from that data table, and is set to 0 (false) when the current record does not come from that data table.

Program 2.9 and SAS Log 2.6 show an example of using the IN= option with an interleaving SET statement. Records are read from the tables in the order specified on the BY statement.

proc sort data = sashelp.class

out = male;

where sex = "M";

by name;

run;

proc sort data = sashelp.class

out = female;

where sex = "F";

by name;

run;

data _null_;

set male (in = inM)

female (in = inF);

by name;

putlog inF= inM= name= sex=;

run;

inF=0 inM=1 Name=Alfred Sex=M

inF=1 inM=0 Name=Alice Sex=F

inF=1 inM=0 Name=Barbara Sex=F

inF=1 inM=0 Name=Carol Sex=F

inF=0 inM=1 Name=Henry Sex=M

inF=0 inM=1 Name=James Sex=M

inF=1 inM=0 Name=Jane Sex=F

inF=1 inM=0 Name=Janet Sex=F

inF=0 inM=1 Name=Jeffrey Sex=M

inF=0 inM=1 Name=John Sex=M

inF=1 inM=0 Name=Joyce Sex=F

inF=1 inM=0 Name=Judy Sex=F

inF=1 inM=0 Name=Louise Sex=F

inF=1 inM=0 Name=Mary Sex=F

inF=0 inM=1 Name=Philip Sex=M

inF=0 inM=1 Name=Robert Sex=M

inF=0 inM=1 Name=Ronald Sex=M

inF=0 inM=1 Name=Thomas Sex=M

inF=0 inM=1 Name=William Sex=M

The IN= option is particularly useful in the MERGE statement to determine whether the current record came from one or multiple tables. In fact, I rarely use a MERGE statement without also using the IN= option.

The example in Program 2.10 and SAS Log 2.7 shows merging two tables together. One table has all the students from the sashelp.class table who are 13 and older, and the second table has all the students who are 14 and older. This means that some of the records from the 13+ table do not have matches in the 14+ table. Using the IN= option on each table in the MERGE statement enables you to determine whether each record came from both tables or only one table.

proc sort data = sashelp.class

out = thirteen_plus;

where age ge 13;

by name;

run;

proc sort data = sashelp.class

out = fourteen_plus (rename = (age = age14));

where age gt 13;

by name;

run;

data _null_;

merge thirteen_plus (in = in13_plus)

fourteen_plus (in = in14_plus);

by name;

putlog in13_plus= in14_plus= name= age=;

run;

in13_plus=1 in14_plus=1 Name=Alfred Age=14

in13_plus=1 in14_plus=0 Name=Alice Age=13

in13_plus=1 in14_plus=0 Name=Barbara Age=13

in13_plus=1 in14_plus=1 Name=Carol Age=14

in13_plus=1 in14_plus=1 Name=Henry Age=14

in13_plus=1 in14_plus=1 Name=Janet Age=15

in13_plus=1 in14_plus=0 Name=Jeffrey Age=13

in13_plus=1 in14_plus=1 Name=Judy Age=14

in13_plus=1 in14_plus=1 Name=Mary Age=15

in13_plus=1 in14_plus=1 Name=Philip Age=16

in13_plus=1 in14_plus=1 Name=Ronald Age=15

in13_plus=1 in14_plus=1 Name=William Age=15

Using PROC SQL

PROC SQL is a great tool for working with data tables and particularly for joining tables. However, it is not necessarily the best tool all the time, just as DATA step is not always the best tool. I think that PROC SQL is hard to debug. I also don’t find it intuitive to write, because it isn’t the top-down programming that I’m used to. But those issues notwithstanding, I still get a lot of use out of PROC SQL.

Choosing to Use PROC SQL

There are many ways to join data tables together. Should you use PROC SQL or should you use a DATA step with a merge? Unfortunately, the answer is, “It depends.” I use both methods regularly, depending on the situation. Here are some of my reasons for choosing one method over the other.

● You are joining tables that don’t have the same variables to merge by, or you want to use a more complex way of matching the variables in the tables (not just equals).

● You are joining more than two tables and you have different matching criteria for each join.

● You want to keep records on the basis of which table they came from (using IN=).

● You need to do some additional DATA step processing, and you don’t another step to read and write the data again after joining it.

Joining

Using PROC SQL to join tables is great because it lets you do the actual join in a much more complex way than a DATA step merge, which is a simple matching of variable values.

I tend to use the full join, left join, right join, and inner join rather than doing a Cartesian Product join (using a comma between the tables). I can get the same results, and I don’t have to remember when to use a WHERE clause and when to use an ON clause. The cheat sheet in Figure 2.6 can help you to remember what each join does.

Using a Subquery

Using subqueries is a great feature of PROC SQL because it enables you to use the output of one query in an expression.

In the example in Program 2.11, if you want to find all the students who are older than the average age in the class, you can do it with a subquery. Figure 2.7 is the output table.

proc sql;

create table tallest as

select name, sex, height

from sashelp.class

where height gt (select avg(height)

from sashelp.class);

quit;

Another good use of a subquery is to compare a value to a set of values from a subquery using the IN operator. For example, the code in Program 2.12 enables you to keep all the students from the sashelp.class table who have the predict value in the sashelp.classfit table greater than 110. Figure 2.8 shows the output table.

proc sql;

create table predict as

select name, age, height

from sashelp.class

where name in (select name

from sashelp.classfit

where predict gt 110);

quit;

Using a Correlated Subquery

A correlated subquery is a subquery that refers to the parent query in the WHERE clause. This means that the subquery is run for each row of the parent query, so it is relatively inefficient.

Program 2.13 is an example of a correlated subquery that finds all the students who are taller than the average height by sex. So it compares each girl’s height to the average of all the girls’ heights, not to the average of all the heights. Figure 2.9 is a listing of the output table.

proc sql;

create table tallest as

select name, sex, height

from sashelp.class as c

where height gt (select avg(height)

from sashelp.class

where sex eq c.sex);

quit;

Using Lookup Tables

Often in SAS you need to look up values from another table. For example, you might have a table that has userids in it, and you need to get the associated user name from a lookup table. There are lots of ways to do this based on what you are looking up and how big your tables are.

Using a Format

You can create a user-defined format to use if your lookup table meets the following criteria:

● It is a manageable size (you’ll have to determine this by creating the format and seeing whether it takes too long to create and use or takes up too much disk space).

● It has only a couple of variables whose values you want to retrieve (you’ll need a separate format for each variable).

The example in Program 2.14 and SAS Log 2.8 uses the sashelp.cars table and illustrates creating a lookup format of each car model to the make of car and a second format of car model to the type of car.

data fmtdata (keep = fmtname start label); ❶

set sashelp.cars;

start = left(model);

fmtname = "$make";

label = make;

output;

fmtname = "$type";

label = type;

output;

run;

proc sort data = fmtdata nodupkey; ❷

by fmtname start;

run;

proc format cntlin = fmtdata; ❸

run;

data _null_; ❹

length model $50.;

do model = "MDX", "ABC", "G500";

type = put(model, $type.);

make = put(model, $make.);

putlog model= type= make=;

end;

run;

❶ Create the table needed for PROC FORMAT. It must have a variable called fmtname that contains the name of the format, a variable called start that is the lookup variable’s value, and a variable called label that contains the value to be retrieved.

❷ Group the data by fmtname if you are creating more than one format. You can have no duplicate lookup values (start). I handle both these requirements with a PROC SORT by fmtname and start and the NODUPKEY option.

❹ Use a DATA step to test the formats. "ABC" is not a valid model, so it will not replace the model with the make and type.

model=MDX type=MDX make=MDX

model=ABC type=ABC make=ABC

model=G500 type=G500 make=G500

Using a Join

Another way to use a lookup table is to join your primary table with the lookup table. This can be done with either PROC SQL or with a DATA step and merge. You can use a join when your lookup table meets these criteria:

● It has one or more retrieval values per key (though this can be awkward because multiple records are created if there are multiple retrieval values).

The example in Program 2.15 and SAS Log 2.9 uses the sashelp.cars table as a lookup table and a straightforward DATA step to merge the data. Use the IN= option to keep only the records that come from your primary table.

data primary; ❶

input model $char50.;

datalines;

MDX

ABC

G500

;

data lookup; ❷

set sashelp.cars;

model = left(model);

run;

proc sort data = primary; ❸

by model;

run;

proc sort data = lookup;

by model;

run;

data _null_; ❹

merge primary (in = inP)

lookup;

by model;

if (inP);

putlog model= make= type=;

run;

❹ Merge the tables and keep only the records from the primary table using the IN= variable.

model=ABC Make= Type=

model=G500 Make=Mercedes-Benz Type=SUV

model=MDX Make=Acura Type=SUV

Using PROC SQL is even easier than the DATA step, because you don’t need to sort the tables. Also, in this example, you don’t need to fix the model value in the lookup table. Program 2.16 uses a left join to ensure that you keep all the records from your primary table. The output table is listed in Figure 2.10.

data primary;

input model $char50.;

datalines;

MDX

ABC

G500

;

proc sql;

create table final as

select p.model, l.make, l.type

from primary as p

left join

sashelp.cars as l

on p.model eq left(l.model);

quit;

Using a SET Statement With KEY=

In a DATA step, you can also use a SET statement with a KEY= option to retrieve your values from the lookup table. To use this method, your lookup table must meet these criteria:

● It has an index on the variable(s) that match the variable name(s) in the primary table.

The example in Program 2.17 and SAS Log 2.10 uses the sashelp.cars table as the lookup table and a SET statement with the KEY= option to look up values.

data primary; ❶

input model $char50.;

datalines;

MDX

ABC

G500

;

data lookup (index = (model)); ❷

set sashelp.cars;

model = left(model);

run;

data _null_; ❸

length make $13 type $8;

call missing(make, type);

set primary; ❹

set lookup key = model;

if (_iorc_ eq 0) then ❺

put "success: " model= make= type=;

else

put "failure: " model=;

_error_ = 0; ❻

run;

❹ Use one SET statement for the primary table and another SET statement with a KEY= option for the lookup table.

❺ Check the _IORC_ automatic variable to see if the value was found in the lookup table or not. A value of 0 means that it was found.

❻ Whenever you use a SET statement with a KEY= option, a MODIFY statement, or any other statements that cause the _IORC_ variable to be created, you should set the _ERROR_ automatic variable to 0. If you don’t do this, the DATA step puts out all the variable values in the log every time the _IORC_ value is not 0.

success: model=MDX make=Acura type=SUV

failure: model=ABC

success: model=G500 make=Mercedes-Benz type=SUV

If you have multiple records for each key value, you can use the same code and loop through the retrieved values instead of just reading one value. The code in Program 2.18 and SAS Log 2.11 shows how to do this.

data primary; ❶

input make $char50.;

datalines;

XYZ

Mercedes-Benz

Audi

;

data lookup (index = (make)); ❷

set sashelp.cars;

model = left(model);

run;

data _null_;

length model $40 type $8;

call missing(model, type);

set primary; ❸

reset = 1;

do until(_iorc_ ne 0); ❹

set lookup key = make keyreset = reset; ❺

if (_iorc_ eq 0) then ❻

put make= model= type=;

end; /* do until - loop though matching rows */

_error_ = 0; ❼

run;

❺ Set the lookup table with the KEY= option pointing to the index. Use the KEYRESET= option to tell the SET statement to go to the next record in the table. Set the reset variable to 0 before the loop begins to tell the SET statement to read the first matching record.

❻ Whenever you use a SET statement with a KEY= option, a MODIFY statement, or any other statements that cause the _IORC_ variable to be created, you should set the _ERROR_ automatic variable to 0. If you don’t do this, the DATA step puts out error messages in the log every time the _IORC_ value is not 0.

make=Mercedes-Benz model=G500 type=SUV

make=Mercedes-Benz model=ML500 type=SUV

make=Mercedes-Benz model=C230 Sport 2dr type=Sedan

make=Mercedes-Benz model=C320 Sport 2dr type=Sedan

make=Mercedes-Benz model=C240 4dr type=Sedan

...

make=Audi model=A4 1.8T 4dr type=Sedan

make=Audi model=A41.8T convertible 2dr type=Sedan

make=Audi model=A4 3.0 4dr type=Sedan

make=Audi model=A4 3.0 Quattro 4dr manual type=Sedan

make=Audi model=A4 3.0 Quattro 4dr auto type=Sedan

make=Audi model=A6 3.0 4dr type=Sedan

...

Using Hash Tables

Using a hash table in a DATA step is an excellent option if your lookup table is very large. Setting up the lookup table as a hash table puts the data into memory so that the values can be retrieved much more quickly. If the neither of your tables is very large, then using a hash table is probably less efficient than some of the other lookup methods.

Program 2.19 and SAS Log 2.12 show the same example again with sashelp.cars as the lookup table. This example shows how to create and use a basic hash table. Note that there are lots of things you can do with a hash table. This is one of the more common uses.

data primary; ❶

input model $char40.;

datalines;

MDX

ABC

G500

;

data lookup; ❷

set sashelp.cars;

model = left(model);

run;

data _null_;

if 0 then

set lookup; ❸

if _n_=1 then

do;

declare hash H (dataset: "lookup"); ❹

rc = H.defineKey("model");

rc = H.defineData("make", "type");

rc = H.defineDone();

call missing(make, type); ❺

end;

set primary; ❻

rc = H.find(); ❼

if rc eq 0 then ❽

putlog "success: " model= make= type=;

else

putlog "failure: " model=;

run;

❸ The hash table variables must be defined in the DATA step before you create the hash table. I use the if (0) syntax to bring all the variable definitions from the lookup table into the DATA step without having to define them all.

❹ Create the hash table with a DECLARE statement. Then define the key and the data.

❺ Set all the hash data values to missing so that you don’t get notes about uninitialized variables.

success: Model=MDX Make=Acura Type=SUV

failure: Model=ABC

success: Model=G500 Make=Mercedes-Benz Type=SUV

Updating Data In Place

There are times that you need to update a data table in place, meaning that you just want to update some of the records in a table without rewriting the table completely. This is useful when the table is very big, when you don’t want to lose any of the attributes of the original table, or when you are updating a DBMS table. There are several ways to do so, including PROC SQL, a DATA step with a MODIFY statement, and PROC APPEND.

For the following examples, I used the code in Program 2.20 to make a copy of the sashelp.class table in the work library, since I don’t want to permanently change the sashelp table:

proc copy in = sashelp out = work;

select class;

run;

Using PROC SQL

There are three different statements in PROC SQL that you can use to update a table in place: INSERT, DELETE and UPDATE.

Using the INSERT Clause

The INSERT clause adds records to the data table using the VALUES, SET, or SELECT clauses. Each of the following examples insert two new rows into the class table.

The example in Program 2.21 and SAS Log 2.13 uses the VALUES clause, which lets you list the values that will be inserted into the table. The values must be in the same order as the variables in the table.

proc sql;

insert into class

values("Martha", "F", 14, 64, 106)

values("Chris", "M", 13, 67, 111);

quit;

NOTE: 2 rows were inserted into WORK.CLASS.

The example in Program 2.22 and SAS Log 2.14 uses the SET clause to specify each variable and its value.

proc sql;

insert into class

set name = "Martha", sex = "F", age = 14,

height = 64, weight = 106

set name = "Chris", sex = "M", age = 13,

height = 67, weight = 111;

quit;

NOTE: 2 rows were inserted into WORK.CLASS.

The example in Program 2.23 and SAS Log 2.15 shows inserting rows from another table using a SELECT/FROM clause. The variables in the second table must be in the same order as the variables in the main table.

data second;

if (0) then

set class;

input name sex age height weight;

datalines;

Martha F 14 64 106

Chris M 13 67 111

;

proc sql;

insert into class

select *

from second;

quit;

NOTE: 2 rows were inserted into WORK.CLASS.

Using the DELETE Clause

The example in Program 2.24 and SAS Log 2.16 illustrates deleting all the records from the table where the value of name is “Jane”.

proc sql;

delete from class

where name eq "Jane";

quit;

NOTE: 1 row was deleted from WORK.CLASS.

The example in Program 2.25 and SAS Log 2.17 illustrates deleting all the records in the table, so be careful!

proc sql;

delete from class;

quit;

NOTE: 22 rows were deleted from WORK.CLASS.

Using the UPDATE Clause

The UPDATE clause updates the values for the given variables. It can update all the records or just the records that match a WHERE condition.

The example in Program 2.26 and SAS Log 2.18 sets the value of age to 15 for all the records where the value of name is “Jane”.

proc sql;

update class

set age = 15

where name eq "Jane";

quit;

NOTE: 1 row was updated in WORK.CLASS.

Using the MODIFY Statement

Using a DATA step with the MODIFY statement is quite powerful because you can use programming statements to conditionally insert, delete and update records and values in a single step.

You need to use the _IORC_ automatic variable, which is created for you by the MODIFY statement, and the %SYSRC() macro to check the return code, as follows:

● The value of _IORC_ is %SYSRC(_DSENMR) if the record was not found in the main table.

If the value of _IORC_ is not 0, then the DATA step puts all the variable values into the SAS log for this record. You can stop this information from filling up the SAS log by setting the _ERROR_ variable to 0.

The example in Program 2.27 and SAS Log 2.19 combines the main class table with a second table using a BY statement. The program performs the following actions:

● If the name is in both tables, and the sex is M, then the record is deleted from the class table.

● If the name is in both tables, and the sex is F, then the record is updated with the values from the second table.

● If the name is not in the class table, then the record from the second table is added to the class table.

data second;

if (0) then

set class;

input name sex age height weight;

datalines;

Jane F 14 64 106

Robert M 15 69 113

Mark M 13 67 111

;

data class;

modify class second;

by name;

if (_iorc_ eq 0) then

do;

if (sex eq "M") then

remove;

else

replace;

end;

else if (_iorc_ eq %sysrc(_dsenmr)) then

output;

_error_ = 0;

run;

NOTE: The data set WORK.CLASS has been updated. There were 1 observations rewritten, 1 observations added and 1 observations

deleted.

The modify statement can also be used when the main table is indexed so that you look for a key variable instead of using a BY statement. The example in Program 2.28 and SAS Log 2.20 shows the same process as the previous one, but uses an index on the class table. Note that you do need to check for a different _IORC_ return code, %SYSRC(_DSENOM), when looking for non-matches.

data second;

if (0) then

set class;

input name sex age height weight;

datalines;

Jane F 14 64 106

Robert M 15 69 113

Mark M 13 67 111

;

proc datasets lib = work nolist nowarn;

modify class;

index create name;

quit;

data class;

set second;

modify class key = name;

if (_iorc_ eq 0) then

do;

if (sex eq "M") then

remove;

else

replace;

end;

else if (_iorc_ eq %sysrc(_dsenom)) then

output;

_error_ = 0;

run;

NOTE: The data set WORK.CLASS has been updated. There were 1 observations rewritten, 1 observations added and 1 observations

deleted.

Using PROC APPEND

You can also use PROC APPEND to update a table in place. It does not read or write the records that are in the main table (BASE=); it just adds the records from the second data table (DATA=) to the end of the main table. It is much more efficient than concatenating tables using a DATA step with a SET statement, because the first table does not need to be read and written. You do need to be sure that the second table has the same structure as the first table—the same variables with the same attributes. You get error messages unless you use the FORCE option, and even with FORCE, you still get warning messages if the lengths are different.

The example in Program 2.29 and SAS Log 2.21 uses the technique described in the section “Copying Variable Attributes” to get the correct variable attributes for the second table.

data second;

if (0) then

set class;

input name sex age height weight;

datalines;

Martha F 14 64 106

Chris M 13 67 111

;

proc append base = class data = second;

run;

NOTE: Appending WORK.SECOND to WORK.CLASS.

NOTE: There were 2 observations read from the data set WORK.SECOND.

NOTE: 2 observations added.

Finding Records

“Does this table have any records?”, “How many records are in this table?”, and the related “Does this table exist?” are all very common questions that need to be answered in SAS programs.

Determining Whether a Table Exists

To check to see whether a data table exists, you can use the EXIST() function in a DATA step or in macro code with %SYSFUNC(). The function returns 1 (true) if the table exists and 0 (false) if the table does not exist.

The example in Program 2.30 and SAS Log 2.22 shows how to use the EXIST() function in a DATA step.

data _null_;

do table = "sashelp.class", "sashelp.xyz";

if (exist(table)) then

putlog table "exists";

else

putlog table "does not exist";

end;

run;

sashelp.class exists

sashelp.xyz does not exist

In a macro, you can use the EXIST() function with the %SYSFUNC() macro function. Program 2.31 and SAS Log 2.23 show how to use the EXIST() function in a macro with the %SYSFUNC() macro function.

%macro testExist(table);

%if (%sysfunc(exist(&table))) %then

%put &table exists;

%else

%put &table does not exist;

%mend testExist;

%testExist(sashelp.class);

%testExist(sashelp.xyz);

11 %testExist(sashelp.class);

sashelp.class exists

12 %testExist(sashelp.xyz);

sashelp.xyz does not exist

Getting the Number of Records

Several different methods enable you to get the number of records in a table. Some of these methods work better than others, but all are valid.

One of the things to keep in mind when you are looking at the number of records in a SAS data table is that tables can contain deleted records. You can’t see the deleted records, but you can see missing rows when you look at the OBS number that is displayed in a PROC PRINT or in the VIEWTABLE window. Deleted records occur when you use a method that deletes records without rewriting the entire table.

Using Brute Force

Just count the records. This works great if you are already reading the data table. If you are not already reading the data table, this is not very efficient. You can either use the _N_ variable, or you can create a counter. Program 2.32 and SAS Log 2.24 show both methods.

%let numRecords = 0;

data _null_;

set sashelp.class end = eof;

count + 1;

if (eof) then

do;

call symputx("numRecords", count);

call symputx("numRecords2", _n_);

end;

run;

%put &=numRecords;

%put &=numRecords2;

12 %put &=numRecords;

NUMRECORDS=19

13 %put &=numRecords2;

NUMRECORDS2=19

Using the NOBS= Option

This option is better than the brute force option, but is not very elegant, still a little clunky. However, it might make sense in certain situations. The NOBS= option on a SET statement creates a variable that contains the number of records before the DATA step even starts, so it isn’t necessary to physically read any records from the table. Program 2.33 and SAS Log 2.25 show how to do this.

%let numRecords = 0;

data _null_;

if (0) then

set sashelp.class nobs = numRecords;

call symputx("numRecords", numRecords);

stop;

run;

%put &=numRecords;

10 %put &=numRecords;

NUMRECORDS=19

Using PROC CONTENTS

There is a lot of great information available in the output from PROC CONTENTS, including the number of records in the table. The OUT= data table contains a record for each variable in the table, and the table-level information (including the number of records) is repeated on each record of the OUT= data table. The NOBS variable in the OUT= data table contains the number of records in the table.

I don’t recommend this method unless you need the PROC CONTENTS output for another purpose. It requires a procedure and a DATA step and creates a temporary table. That is a lot of work just to get the number of records.

%let numRecords = 0;

proc contents data = sashelp.class

out = contents

noprint;

run;

data _null_;

set contents(obs = 1);

call symputx("numRecords", nobs);

run;

%put &=numRecords;

1 %let numRecords = 0;

3 proc contents data = sashelp.class

4 out = contents

5 noprint;

6 run;

NOTE: The data set WORK.CONTENTS has 5 observations and 41 variables.

NOTE: PROCEDURE CONTENTS used (Total process time):

real time 0.02 seconds

cpu time 0.01 seconds

8 data _null_;

9 set contents(obs = 1);

10 call symputx("numRecords", nobs - delobs);

11 run;

NOTE: There were 1 observations read from the data set WORK.CONTENTS.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

13 %put &=numRecords;

NUMRECORDS=19

Using the sashelp.vtable View

The sashelp.vtable view contains most of the header information for all tables that are defined in your SAS session. There is one record in the view for each table in the SAS session, so you must subset the view to locate the table that you want (using the LIBNAME and MEMNAME variables), and then get the number of records from the NLOBS variable. Note that using this view might not be very efficient if you have a lot of tables defined in your SAS session.

Program 2.35 and SAS Log 2.27 show how to use the sashelp.vtable to get the number of records in the sashelp.class table.

%let numRecords = 0;

data _null_;

set sashelp.vtable;

where libname = "SASHELP" and memname = "CLASS";

call symputx("numRecords", nlobs);

run;

%put &=numRecords;

1 %let numRecords = 0;

3 data _null_;

4 set sashelp.vtable;

5 where libname = "SASHELP" and memname = "CLASS";

6 call symputx("numRecords", nlobs);

7 run;

NOTE: There were 1 observations read from the data set SASHELP.VTABLE.

WHERE (libname='SASHELP') and (memname='CLASS');

NOTE: DATA statement used (Total process time):

real time 0.13 seconds

cpu time 0.12 seconds

9 %put &=numRecords;

NUMRECORDS=19

Using the ATTRN() Function

The ATTRN() function can be used to get lots of information about a data table, and usually it does not require reading any records from the table. For the number of records, you can look at various attributes:

The total number of undeleted records with a WHERE clause, or with the OBS= or FIRSTOBS= options applied.

To calculate this value, the function must read all the records in the table. The NOBS and NLOBS attributes are saved in the data table’s header, so the function doesn’t need to read all the records for those attributes.

I would suggest using NLOBS most of the time unless you want to see the number of records in a subset. In that case, use NLOBSF.

Some DBMS tables might not provide the information that you can get from a SAS table (including the number of records). In that case, you need to use the brute force method described earlier.

%let numRecords = 0;

data _null_;

table = "sashelp.class";

dsid = open(table);

if (dsid gt 0) then

do;

nobs = attrn(dsid, "NLOBS");

call symputx("numRecords", nobs);

dsid = close(dsid);

end;

run;

%put &=numRecords;

14 %put &=numRecords;

NUMRECORDS=19

%macro numRecords(table);

%let dsid = %sysfunc(open(&table));

%if (&dsid gt 0) %then

%do;

%let numRecords = %sysfunc(attrn(&dsid, NLOBS));

%let dsid = %sysfunc(close(&dsid));

%end;

%mend numRecords;

%let numRecords = 0;

%numRecords(sashelp.class);

%put &=numRecords;

14 %put &=numRecords;

NUMRECORDS=19

Verifying That a Table Has Records

If you find out how many records are in the table, you automatically know whether there are any records, because the number is greater than 0. A few other methods can tell you whether you have any records.

Using Brute Force

The brute force method for finding out whether a table has any records involves trying to read the table in a DATA step. If there are no records, then none of the code in the DATA step runs.

The first DATA step in the example in Program 2.38 and SAS Log 2.30 checks to see whether there are any records in the sashelp.class table. The second DATA step checks the same table, this time with a WHERE clause applied, which returns no records. This method relies on setting the &ANYRECORDS macro variable to 0 before running the DATA step. If there are no records in the data table, then the DATA step doesn’t execute, and &ANYRECORDS keeps the value of 0.

%let anyRecords = 0;

data _null_;

set sashelp.class;

call symput("anyRecords", "1");

stop;

run;

%put &=anyRecords;

%let anyRecords = 0;

data _null_;

set sashelp.class;

where name eq "Martha";

call symput("anyRecords", "1");

stop;

run;

%put &=anyRecords;

9 %put &=anyRecords;

ANYRECORDS=1

...

20 %put &=anyRecords;

ANYRECORDS=0

Using the ATTRN() Function

The ATTRN() function has an ANY attribute that you can query. The ANY attribute has a value of 0 when there are no records in the table, -1 if there are no records or no variables in the table, and 1 if there are records in the table.

Program 2.39 and SAS Log 2.31 show how to use the ATTRN() function in a DATA step to query a table for any records.

data oneRecordOneVariable;

x=1;

run;

data noRecordsOneVariable;

set oneRecordOneVariable;

stop;

run;

data noRecordsNoVariables;

stop;

run;

data _null_;

do table = "oneRecordOneVariable",

"noRecordsOneVariable",

"noRecordsNoVariables",

"xxx";

dsid = open(table);

if (dsid gt 0) then

do;

any = attrn(dsid, "ANY");

if (any gt 0) then

putlog "There are records in " table;

else if (any eq 0) then

putlog "There are no records in " table;

else

putlog "There are no records/variables in " table;

dsid = close(dsid);

end;

else

putlog table " could not be opened";

end;

run;

There are records in oneRecordOneVariable

NOTE: No observations in data set WORK.NORECORDSONEVARIABLE.

There are no records in noRecordsOneVariable

NOTE: No variables in data set WORK.NORECORDSNOVARIABLES.

There are no records/variables in noRecordsNoVariables

xxx could not be opened

data oneRecordOneVariable;

x=1;

run;

data noRecordsOneVariable;

set oneRecordOneVariable;

stop;

run;

data noRecordsNoVariables;

stop;

run;

%macro anyRecords(table);

%let dsid = %sysfunc(open(&table));

%if (&dsid gt 0) %then

%do;

%let any = %sysfunc(attrn(&dsid, ANY));

%if (&any gt 0) %then

%put There are records in &table;

%else %if (&any eq 0) %then

%put There are no records in &table;

%else

%put There are no records/variables in &table;

%let dsid = %sysfunc(close(&dsid));

%end;

%else

%put &table could not be opened;

%mend anyRecords;

%anyRecords(oneRecordOneVariable);

%anyRecords(noRecordsOneVariable);

%anyRecords(noRecordsNoVariables);

%anyRecords(xxx);

31 %anyRecords(oneRecordOneVariable);

There are records in oneRecordOneVariable

32 %anyRecords(noRecordsOneVariable);

NOTE: No observations in data set WORK.NORECORDSONEVARIABLE.

There are no records in noRecordsOneVariable

33 %anyRecords(noRecordsNoVariables);

NOTE: No variables in data set WORK.NORECORDSNOVARIABLES.

There are no records/variables in noRecordsNoVariables

34 %anyRecords(xxx);

xxx could not be opened

Re-creating Indexes

Sometimes it is necessary to rewrite a data table that has an index on it, so you need to re-create the index on the updated table. But what if you don’t know what the index was? PROC CONTENTS has a second output table that can be created using the OUT2= option. The OUT2 table contains information about the indexes, including a great variable called RECREATE that has the actual PROC DATASETS code for creating the index. So as long as you run PROC CONTENTS before you modify the table, you can then re-create the index on the updated table. The RECREATE variable also contains the statement to re-create any integrity constraints that might be on the table. The example in Program 2.41 and SAS Log 2.33 shows how to use the RECREATE variable to add an index to the cars table that was retrieved from the cars_with_index table.

data cars_with_index (index = (make_type = (make type) model))

cars;

set sashelp.cars;

run;

proc contents data = work.cars_with_index

out2 = indexes

noprint;

run;

data _null_;

set indexes end = eof;

if (_n_ eq 1) then

do;

call execute("proc datasets lib = work nowarn nolist;");

call execute(" modify cars;");

end;

call execute(" " !! recreate);

if (eof) then

call execute("quit;");

run;

1 data cars_with_index (index = (make_type = (make type) model))

2 cars;

3 set sashelp.cars;

4 run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.

NOTE: The data set WORK.CARS_WITH_INDEX has 428 observations and 15 variables.

NOTE: The data set WORK.CARS has 428 observations and 15 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.01 seconds

6 proc contents data = work.cars_with_index

7 out2 = indexes

8 noprint;

9 run;

NOTE: The data set WORK.INDEXES has 2 observations and 21 variables.

NOTE: PROCEDURE CONTENTS used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

11 data _null_;

12 set indexes end = eof;

13 if (_n_ eq 1) then

14 do;

15 call execute("proc datasets lib = work nowarn nolist;");

16 call execute(" modify cars;");

17 end;

18 call execute(" " !! recreate);

19 if (eof) then

20 call execute("quit;");

21 run;

NOTE: There were 2 observations read from the data set WORK.INDEXES.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

NOTE: CALL EXECUTE generated line.

1 + proc datasets lib = work nowarn nolist;

2 + modify cars;

3 + Index create Model / Updatecentiles=5;

NOTE: Simple index Model has been defined.

4 + Index create make_type=( Make Type ) / Updatecentiles=5;

NOTE: Composite index make_type has been defined.

5 + quit;

NOTE: MODIFY was successful for WORK.CARS.DATA.

NOTE: PROCEDURE DATASETS used (Total process time):

Chapter 2: Data Tables

Introduction

Copying Variable Attributes

Reading Data with a SET Statement

Concatenating Tables

Interleaving Tables

Using Multiple SET Statements

Using NOBS= and POINT=

Using KEY=

Determining Which Table a Record Is From

Using PROC SQL

Choosing to Use PROC SQL

Joining

Using a Subquery

Using a Correlated Subquery

Using Lookup Tables

Using a Format

Using a Join

Using a SET Statement With KEY=

Using Hash Tables

Updating Data In Place

Using PROC SQL

Using the INSERT Clause

Using the DELETE Clause

Using the UPDATE Clause

Using the MODIFY Statement

Using PROC APPEND

Finding Records

Determining Whether a Table Exists

Getting the Number of Records

Using Brute Force

Using the NOBS= Option

Using PROC CONTENTS

Using the sashelp.vtable View

Using the ATTRN() Function

Verifying That a Table Has Records

Using Brute Force

Using the ATTRN() Function

Re-creating Indexes