Practical and Efficient SAS® Programming: The Insider’s Guide

In a SAS program, you sometimes need to access the operating system (OS) to run system commands and to access the file system. Reading data from external files is usually a requirement in a SAS application; you have to get the data into tables before you can analyze it. This chapter contains some tips and techniques that will help you navigate the world beyond SAS. Since Windows and Unix are the most popular operating systems with SAS programmers, I am going to focus only on those environments. Many of these programs work on other operating systems; see the relevant SAS Companion guide to work with their idiosyncrasies.

Checking the Operating System

To work with the operating system, you must know what OS the program is running on, especially if you are going to run the same program on multiple operating systems. Table 3.1 shows the automatic macro variables &SYSSCP and &SYSSCPL and their values on the different operating systems.

Operating System	&SYSSCP	&SYSSCPL
Windows	WIN	Version of Windows
Unix	Unix processor (for example, LIN X64, SUN 64, AIX 64)	Specific Unix environment (for example, HP-UX, SunOS, AIX)

If I am writing a program that might run on any of these systems, I generally check whether the value of &SYSSCP is WIN, and I use an open-ended ELSE for Unix (because the values vary on the basis of the specific Unix processor and environment). You can see an example in Program 3.1. SAS Log 3.1 shows the log when the program was run on Windows (64-bit Windows 7 Pro), and SAS Log 3.2 shows the log when the program was run on Unix (64-bit Linux).

%macro check_os;

%if (&sysscp eq WIN) %then

%put Windows: &sysscpl;

%else %if (&sysscp eq OS) %then

%put z/OS: &sysscpl;

%else

%put Unix: &sysscp &sysscpl;

%mend check_os;

%check_os;

Running Operating System Commands

To run an OS command from a SAS program, you must have the XCMD system option turned on. If you are running SAS in batch or interactively using Display Manager or line mode, XCMD is usually turned on by default. If it is not turned on, you can add it to the SAS config file, or you can specify it on the sas command.

If you are running on a workspace server (for example, from a stored process or from SAS Enterprise Guide), this option is usually set to NOXCMD. Because this is a configuration or invocation option only (it must be specified as SAS is started), the best place to set this option is in SAS Management Console, as follows:

9. The workspace server will need to be restarted after setting this option, but once it is set, it will be set for all sessions.

Using the X Statement

The X statement and the %SYSEXEC macro statement can be executed from anywhere in a SAS program to run an OS command:

This is the most efficient method of running a command and can be conditionally executed by macro statements if necessary. The return code from the command is saved in the &SYSRC automatic macro variable: 0 = success, >0 = failure. If you execute multiple commands in a single X statement, you get a negative value in the &SYSRC.

Using the SYSTEM() Function

The SYSTEM() function and the CALL SYSTEM() routine can be used in a DATA step. These are useful if you want to create commands based on the contents of a data table. The SYSTEM() function produces a return code that is 0 for success and >0 for failure. A negative value generally indicates a problem, but it depends on the operating system and the command. The CALL SYSTEM() routine does not produce a retrievable return code, which is why I never use it.

The example in Program 3.2 creates a directory for each person in the sashelp.class table. It also determines which directory to use based on the operating system. Figure 3.1 shows the results on Windows and Figure 3.2 shows the results on Unix.

options noxwait;

data _null_;

set sashelp.class;

if (_n_ eq 1) then

do;

if ("&sysscp" eq "WIN") then

rc = system("cd c:/temp");

else

rc = system("cd /tmp");

if (rc ne 0) then

do;

putlog "ERR" "OR: Unable to get to $HOME";

stop;

end;

rc = system("mkdir class");

rc = system("cd class");

end;

call system("mkdir " !! name);

run;

Using a FILENAME Pipe

Another way to run an OS command is to use a FILENAME statement with a pipe device, and a DATA step with INFILE and INPUT statements. I use this method quite often because it gives the most control. You can retrieve the results from the command, including error messages.

The code in Program 3.3 works on Windows and Unix to give you a directory listing. The output from the command is read with the DATA step and written back into the log. SAS Log 3.3 was created on Windows, and SAS Log 3.4 was created on Unix.

%macro runPipe;

%if (&sysscp eq WIN) %then

%do;

filename os_cmd pipe "cd c:\temp\class & dir";

%end;

%else

%do;

filename os_cmd pipe "cd /tmp/class; ls -l";

%end;

data _null_;

infile os_cmd;

input;

putlog _infile_;

run;

filename os_cmd;

%mend;

%runPipe

NOTE: The infile OS_CMD is:

Unnamed Pipe Access Device,

PROCESS=cd c:\temp\class & dir,RECFM=V,

LRECL=32767

Volume in drive C is Win7x64

Volume Serial Number is 7217-679F

Directory of c:\temp\class

01/01/2017 10:30 AM <DIR> .

01/01/2017 10:30 AM <DIR> ..

01/01/2017 10:30 AM <DIR> Alfred

01/01/2017 10:30 AM <DIR> Alice

01/01/2017 10:30 AM <DIR> Barbara

01/01/2017 10:30 AM <DIR> Carol

01/01/2017 10:30 AM <DIR> Henry

01/01/2017 10:30 AM <DIR> James

01/01/2017 10:30 AM <DIR> Jane

01/01/2017 10:30 AM <DIR> Janet

01/01/2017 10:30 AM <DIR> Jeffrey

01/01/2017 10:30 AM <DIR> John

01/01/2017 10:30 AM <DIR> Joyce

01/01/2017 10:30 AM <DIR> Judy

01/01/2017 10:30 AM <DIR> Louise

01/01/2017 10:30 AM <DIR> Mary

01/01/2017 10:30 AM <DIR> Philip

01/01/2017 10:30 AM <DIR> Robert

01/01/2017 10:30 AM <DIR> Ronald

01/01/2017 10:30 AM <DIR> Thomas

04/13/2017 10:30 AM <DIR> William

0 File(s) 0 bytes

21 Dir(s)

NOTE: 28 records were read from the infile OS_CMD.

The minimum record length was 0.

The maximum record length was 50.

NOTE: The infile OS_CMD is:

Pipe command="cd /tmp/class; ls -l"

total 76

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Alfred

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Alice

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Barbara

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Carol

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Henry

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 James

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Jane

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Janet

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Jeffrey

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 John

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Joyce

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Judy

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Louise

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Mary

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Philip

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Robert

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Ronald

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Thomas

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 William

NOTE: 20 records were read from the infile OS_CMD.

The minimum record length was 8.

The maximum record length was 54.

Working with the File System

From a DATA step, you might need to create external files or create a directory structure if it doesn’t already exist. Rather than using system commands, you can use SAS DATA step functions to do this.

Most of the following examples were run on a Windows system and would need a little modification to work on a Unix system.

Creating and Deleting Directories

A set of DATA step functions is available for working on directories. You can check to see whether a directory exists (FEXIST() or FILEEXIST()), you can create a new directory (DCREATE()), you can delete a directory (FDELETE()), you can get the members of a directory (DNUM(), dread()) and you can look at information about a directory (DOPTNUM(), DOPTNAME(), DINFO()).

The example in Program 3.4 uses some of these functions to create a directory tree based on values from the sashelp.cars table. It creates a top level cars directory, and then a directory for each origin value. Under each origin, it creates a directory for each type of vehicle. SAS Log 3.5 shows the directories that are created.

proc sort data = sashelp.cars

out = cars

nodupkey;

by origin type;

run;

data _null_;

set cars;

by origin type;

length originDir topDir $200;

retain originDir topDir;

if (_n_ eq 1) then

do;

topDir = dcreate("cars", "c:\");

if (topDir eq "") then

topDir = "c:\cars";

else

putlog "Created: " topDir;

end;

if (first.origin) then

do;

originDir = "";

originDir = dcreate(strip(origin), topDir);

if (originDir eq "") then

originDir = catx("\", topDir, origin);

else

putlog "Created: " originDir;

end;

typeDir = dcreate(strip(type), originDir);

if (typeDir eq "") then

typeDir = catx("\", originDir, type);

else

putlog "Created: " typeDir;

run;

Created: c:\cars

Created: c:\cars\Asia

Created: c:\cars\Asia\Hybrid

Created: c:\cars\Asia\SUV

Created: c:\cars\Asia\Sedan

Created: c:\cars\Asia\Sports

Created: c:\cars\Asia\Truck

Created: c:\cars\Asia\Wagon

Created: c:\cars\Europe

Created: c:\cars\Europe\SUV

Created: c:\cars\Europe\Sedan

Created: c:\cars\Europe\Sports

Created: c:\cars\Europe\Wagon

Created: c:\cars\USA

Created: c:\cars\USA\SUV

Created: c:\cars\USA\Sedan

Created: c:\cars\USA\Sports

Created: c:\cars\USA\Truck

Created: c:\cars\USA\Wagon

Here is something to watch out for with DCREATE(): Note that the originDir is set to blank before the DCREATE(). function executes. This is because if it has a non-blank value and the DCREATE() function does not create the directory, then the originDir does not change. So you can’t test for blank to see whether DCREATE() was successful.

If you want to delete a directory, you can use the FDELETE() function. However, the fdelete() can only be used to delete an empty directory, so you must first delete all the member directories and files. The DNUM() function (get the number of members) and the DREAD() function (get the member names) can be used to identify all the members to delete. As an example, the code in Program 3.5 deletes the cars directory structure that was created by Program 3.4. SAS Log 3.6 shows the directories as they are deleted.

data _null_;

carspath = "c:\cars";

if (fileexist(carsPath)); ❶

rc = filename("cars", carsPath); ❷

cars_did = dopen("cars");

do c = 1 to dnum(cars_did); ❸

origin = dread(cars_did, c); ❹

originPath = catx("\", carsPath, origin);

rc = filename("origin", originPath);

origin_did = dopen("origin");

do o = 1 to dnum(origin_did); ❺

type = dread(origin_did, o); ❻

typePath = catx("\", originPath, type);

rc = filename("type", typePath);

rc = fdelete("type");

rc = filename("type");

putlog "Deleted: " typePath;

end; /* do o – loop through origin directories */

origin_did = dclose(origin_did); ❼

rc = fdelete("origin");

rc = filename("origin");

putlog "Deleted: " originPath;

end; /* do c – loop through cars directory */

cars_did = dclose(cars_did); ❽

rc = fdelete("cars");

rc = filename("cars");

putlog "Deleted: " carsPath;

run;

Deleted: c:\cars\Asia\Hybrid

Deleted: c:\cars\Asia\Sedan

Deleted: c:\cars\Asia\Sports

Deleted: c:\cars\Asia\SUV

Deleted: c:\cars\Asia\Truck

Deleted: c:\cars\Asia\Wagon

Deleted: c:\cars\Asia

Deleted: c:\cars\Europe\Sedan

Deleted: c:\cars\Europe\Sports

Deleted: c:\cars\Europe\SUV

Deleted: c:\cars\Europe\Wagon

Deleted: c:\cars\Europe

Deleted: c:\cars\USA\Sedan

Deleted: c:\cars\USA\Sports

Deleted: c:\cars\USA\SUV

Deleted: c:\cars\USA\Truck

Deleted: c:\cars\USA\Wagon

Deleted: c:\cars\USA

Deleted: c:\cars

Working with Files

A set of functions is available for reading and writing external files (for example,FPUT(), FWRITE(), FGET(), FREAD()). I find using these functions to be a much more difficult way to read or write an external file than to use the FILE and INFILE statements with the PUT and INPUT statements. If you need to read or write an external file from outside a DATA step using a macro, then these file functions with %SYSFUNC work. From within a DATA step, I haven’t found much use for these functions.

There are some other functions that I find much more useful for manipulating files. You can delete files (FDELETE()), copy files (FCOPY()), look at file information (FOPTNUM(), FOPTNAME(), FINFO()), and check to see if a file exists (FEXIST(), FILEEXIST()).

The example in Program 3.6 and SAS Log 3.7 uses these file functions to see whether a file exists, to copy one file to another, to create a file and write to it, to look at the file information, and finally to delete the file.

data _null_;

oldFile = "c:\old.txt";

newFile = "c:\new.txt";

rc = filename("new", "newFile"); ❶

if (fileexist(oldFile) and not fexist("new")) then ❷

do;

rc = filename("old", oldFile);

rc = fcopy("old", "new"); ❸

rc = filename("old");

putlog "Copied old file (" oldFile ") to "

"new file (" newFile +(-1) ")";

end; /* if – old file exists, new file doesn’t exist */

else if (not fexist("new")) then ❹

do;

fid = fopen("new", "O"); ❺

rc = fput(fid, "Testing");

rc = fwrite(fid);

rc = fclose(fid);

putlog "Created new file (" newFile +(-1) ")";

end; /* else if – old file and new file don’t exist */

fid = fopen("new"); ❻

do i = 1 to foptnum(fid);

name = foptname(fid, i);

value = finfo(fid, name);

putlog name= value=;

end; /* do i – loop through file info */

fid = fclose(fid);

rc = fdelete("new"); ❼

putlog "Deleted new file (" newFile +(-1) ")";

run;

❶ Use the FILENAME() function to set a fileref pointing to the new file (which does not exist).

❷ Use the FILEEXIST() function to see whether the old file exists, and use the FEXIST() function to see whether the new file exists. The difference between them is that FILEEXIST() accepts a path, while FEXIST() accepts a fileref. In this example, the old.txt file did not exist, so this IF statement was not true.

❻ Open the new.txt file and write all the available information about the file to the SAS log.

Created new file (c:\new.txt)

name=Filename value=C:\Users\sasmih\newFile

name=RECFM value=V

name=LRECL value=32767

name=File Size (bytes) value=9

name=Last Modified value=04Feb2017:16:54:15

name=Create Time value=04Feb2017:16:54:15

Deleted new file (c:\new.txt)

Reading and Writing External Files

If you are reading from and writing to an external file using the INFILE/INPUT and FILE/PUT statements, a couple of features can make things easier.

Using the $VARYING. Format and Informat

The $VARYING. format is a great way to write values that aren’t always the same length. The syntax for this format requires that you specify the actual length of the value after $VARYING:

The example in Program 3.7 shows the difference between using a regular $CHAR. format and $VARYING. format.

data _null_;

set sashelp.class;

file "c:\class.txt";

put "$char " name $char.;

len = length(name);

put "$varying " name $varying. len;

run;

Figure 3.3 shows a listing of the output file that is created. In the output file, I have selected all the text. You can see that when using the $CHAR. format, the line is padded with blanks, while the $VARYING. line is not. In this example, it doesn’t make too much difference, but if you are creating a large file with long, varying lines of text, the extra spaces make the file even larger.

I use the $VARYING. format when I am generating code with a DATA step. It allows me to have some very long lines as well as very short lines, so that when viewed in another editor that has wrapping turned on, only the truly long lines wrap, not the short lines that are padded with blanks.

You can also use the $VARYING. informat when reading from a file. When you do this, you can use the RECLEN= option on the INFILE statement. This option creates an automatic variable that contains the length of the incoming line. You can then use that value to determine the length for the $VARYING. informat. Reading the data this way ensures that you don’t read past the end of the input line. However, I think using the TRUNCOVER option on the INFILE statement is the easiest way to handle reading past the end of the line, so I rarely use the $VARYING. informat.

Using the FILEVAR= Option

The FILEVAR= option can be used on an INFILE statement or on a FILE statement. It lets you change the input file or output file during the execution of a DATA step without having multiple INFILE or FILE statements. It lets you write (using FILE) and read (using INFILE) multiple files in a single DATA step. So instead of creating a separate DATA step to read each file, and then appending the contents together, you can just use one DATA step.

The example in Program 3.8 and SAS Log 3.8 creates three files in a directory and writes to each of them in a single DATA step.

data _null_;

dirName = "c:\filevar\";

if (not fileexist(dirName)) then

dirName = dcreate("filevar", "c:\");

rc = filename("fvar", dirName);

do f = 1 to 3;

fname = cats(dirName, "file", f, ".txt");

file fff filevar = fname;

do i = 1 to f;

put "Line " i +2 "file" f +(-1)".txt";

end; /* do i – loop through records in file */

end; /* do f – loop through files to be created */

rc = filename("fvar");

run;

NOTE: The file FFF is:

Filename=c:\filevar\file1.txt,

RECFM=V,LRECL=32767,File Size (bytes)=0,

NOTE: The file FFF is:

Filename=c:\filevar\file2.txt,

RECFM=V,LRECL=32767,File Size (bytes)=0,

NOTE: The file FFF is:

Filename=c:\filevar\file3.txt,

RECFM=V,LRECL=32767,File Size (bytes)=0,

NOTE: 1 record was written to the file FFF.

The minimum record length was 18.

The maximum record length was 18.

NOTE: 2 records were written to the file FFF.

The minimum record length was 18.

The maximum record length was 18.

NOTE: 3 records were written to the file FFF.

The minimum record length was 18.

The maximum record length was 18.

data _null_;

if (fileexist("c:\filevar"));

rc = filename("fvar", "c:\filevar");

did = dopen("fvar");

do f = 1 to dnum(did);

fname = cats("c:\filevar\", dread(did, f));

infile fff filevar = fname truncover end = eof;

do until(eof);

input @6 lineNum @9 filename $10.;

put lineNum= filename=;

end; /* do until – loop through the infile records */

end; /* do f - loop through files in directory */

did = dclose(did);

rc = filename("fvar");

stop;

run;

NOTE: The infile FFF is:

Filename=c:\filevar\file1.txt,

RECFM=V,LRECL=32767,File Size (bytes)=20,

lineNum=1 filename=file1.txt

NOTE: The infile FFF is:

Filename=c:\filevar\file2.txt,

RECFM=V,LRECL=32767,File Size (bytes)=40,

lineNum=1 filename=file2.txt

lineNum=2 filename=file2.txt

NOTE: The infile FFF is:

Filename=c:\filevar\file3.txt,

RECFM=V,LRECL=32767,File Size (bytes)=60,

lineNum=1 filename=file3.txt

lineNum=2 filename=file3.txt

lineNum=3 filename=file3.txt

NOTE: 1 record was read from the infile FFF.

The minimum record length was 18.

The maximum record length was 18.

NOTE: 2 records were read from the infile FFF.

The minimum record length was 18.

The maximum record length was 18.

NOTE: 3 records were read from the infile FFF.

The minimum record length was 18.

The maximum record length was 18.

Reading Date and Time Values

The ANYDTDTE. (for dates), ANYDTDTM. (for datetimes), and ANYDTTME. (for times) informats are incredibly useful when reading in data that has date and time values. You can use these informats to read many date and time formats, instead of having to find the informat that corresponds to your data. In addition, if you have a date column that has values in different formats (for example, mm/dd/yyyy and ddMONyy), or you don’t know what format the data will be in, you can read them all with the ANYDTDTE. informat.

data _null_;

format date date9.

time time5.

datetime datetime18.;

input @1 date anydtdte15.

@17 time anydttme13.

@31 datetime anydtdtm21.;

putlog date= time= datetime=;

datalines;

01JAN17 2:00pm 01Jan17:14:00

1/1/2017 14:00:00 170101 02:00 pm

January 1, 2017 14:00 1/1/2017 14:00

17001 02:00:00pm 01JAN2017

01Jan17:14:00 01Jan17:14:00 01Jan17:14:00

;

date=01JAN2017 time=14:00 datetime=01JAN17:14:00:00

date=01JAN2017 time=14:00 datetime=01JAN17:00:00:00

date=01JAN2017 time=14:00 datetime=01JAN17:14:00:00

Creating a CSV File from a Data Table

There are several ways of creating a comma-separated values (CSV) file from a data table, but one of the simplest might be one that you haven’t heard of: %DS2CSV. This macro utility is part of Base SAS, and all you need to do is tell it the name of the data table and the name of the CSV file, and run it. There are lots of documented options, including options that can be used if you are running from a web application and want to allow the user to download the CSV file. Program 3.11 and SAS Log 3.11 show a simple usage of %DS2CSV to create a CSV file from the sashelp.class table.

%ds2csv(data = sashelp.class,

csvfile = c:\temp\class.csv,

runmode = b);

NOTE: CSV file successfully generated for SASHELP.CLASS.

You need to set the RUNMODE= option to b if you are running the code in batch or interactive SAS; otherwise, the macro will attempt to run in a SAS workspace server.

PROC EXPORT is another easy way to create a CSV file. It has plenty of documented options that will enable you to create the CSV the way you want it. It also generates a SAS DATA step that you can save and modify if the code isn’t quite what you want. Program 3.12 and SAS Log 3.12 show how to use PROC EXPORT to create a CSV file from the sashelp.class table.

proc export data = sashelp.class

outfile = "c:\temp\class.csv"

dbms = csv

label

replace;

run;

7 /*************************************************************

8 * PRODUCT: SAS

9 * VERSION: 9.4

10 * CREATOR: External File Interface

11 * DATE: 05MAY17

12 * DESC: Generated SAS Datastep Code

13 * TEMPLATE SOURCE: (None Specified.)

14 **************************************************************/

15 data _null_;

16 %let _EFIERR_ = 0; /* set the ERROR detection variable */

17 %let _EFIREC_ = 0; /* clear export record count variable */

18 file 'c:\temp\class.csv' delimiter=',' DSD DROPOVER lrecl=32767;

19 if _n_ = 1 then /* write column names or labels */

20 do;

21 put

22 '"' "Name" '"'

23 ','

24 '"' "Sex" '"'

25 ','

26 '"' "Age" '"'

27 ','

28 '"' "Height" '"'

29 ','

30 '"' "Weight" '"'

31 ;

32 end;

33 set SASHELP.CLASS end=EFIEOD;

34 format Name $8. ;

35 format Sex $1. ;

36 format Age best12. ;

37 format Height best12. ;

38 format Weight best12. ;

39 do;

40 EFIOUT + 1;

41 put Name $ @;

42 put Sex $ @;

43 put Age @;

44 put Height @;

45 put Weight ;

46 ;

47 end;

48 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */

49 if EFIEOD then call symputx('_EFIREC_',EFIOUT);

50 run;

NOTE: The file 'c:\temp\class.csv' is:

Filename=c:\temp\class.csv,

RECFM=V,LRECL=32767,File Size (bytes)=0,

Last Modified=06May2017:08:35:33,

Create Time=17Apr2017:11:31:58

NOTE: 20 records were written to the file 'c:\temp\class.csv'.

The minimum record length was 17.

The maximum record length was 36.

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

19 records created in c:\temp\class.csv from SASHELP.CLASS.

NOTE: "c:\temp\class.csv" file was successfully created.

NOTE: PROCEDURE EXPORT used (Total process time):

real time 0.16 seconds

cpu time 0.09 seconds

Reading a CSV File with Embedded Line Feeds

A common issue when you are working in a Microsoft Excel spreadsheet is using Alt+Enter to go to a new line within the same cell thus creating an embedded line feed character. If you then turn this Excel spreadsheet into a CSV, the value contains the embedded line feed. If you view the file in some basic text editors, you will see that it goes to a new line at this point. If you try to read this file with SAS software, it will see the line feed character as an end of record, will set the rest of the fields to blank, and will go to the next iteration of the DATA step and start reading from the middle of the line.

If you are using a DATA step with an INFILE statement and INPUT statements, then this is an easy fix. You can add the TERMSTR= option to the INFILE statement and it will handle the line feed character correctly.

For this example, I used the code in Program 3.13 to create a CSV file that has an embedded line feed character (0Ax).

data _null_;

file "c:\temp\test.csv";

put '"aaa","bbb","ccc","ddd"';

put '"aa","bb","c' '0A'x 'c","dd"';

put '"a","b","c","d"';

run;

If you look at the file in an editor, you will see that the second record is split over two lines. Figure 3.4 shows the file in Notepad.

Program 3.14 and Figure 3.5 show what happens when you attempt to read this file with a DATA step and INFILE statement.

data test;

infile "c:\temp\test.csv" dlm = "," dsd truncover;

length a b c d $5;

input a b c d;

run;

This table is definitely not what you wanted. So, you can add the TERMSTR=CRLF option to the INFILE statement. This tells the INFILE statement to only use a carriage return + line feed as the end of record. Program 3.15 and Figure 3.6 show what happens when you use the TERMSTR=CRLF option.

data test;

infile "c:\temp\test.csv" dlm = "," dsd truncover

termstr = CRLF;

length a b c d $5;

input a b c d;

run;

However, if you want to read the CSV file with PROC IMPORT, or you actually have an embedded carriage return (0Dx) + line feed (0Ax), not just a line feed character in the file, the only way to handle it is to preprocess the file and change the line feed characters to another character. Additionally, I usually want to maintain the line feeds in my data, so I convert each line feed character to a value that I will be able to find again. After the data has been read, I can turn those values back into line feeds. If you don’t care about maintaining the line feed, then you can just convert them to blanks.

Program 3.16 shows the code that preprocesses the file and converts any line feed that is part of a string that is enclosed in double quotation marks into “~~”. Figure 3.7 shows the preprocessed file in Notepad.

data _null_;

infile "c:\temp\test.csv" recfm=n sharebuffers;

file "c:\temp\test_TEMP.csv" recfm=n;

input a $char1.;

retain inQuotes 0;

if a = '"' then

inQuotes = (not inQuotes);

if (a eq '0A'x and inQuotes) then

put '~~';

else

put a $char.;

run;

Now Program 3.17 can read the test_TEMP.csv file into a SAS table and put the line feeds back where they should be. Figure 3.8 shows the data table that is created from the temporary CSV file.

data test;

infile "c:\temp\test_TEMP.csv" dlm = "," dsd truncover;

length a b c_in c c_hex d $10;

input a b c_in d;

c = tranwrd(c_in, "~~", "0A"x);

c_hex = put(c, hex6.);

run;

As you can see from the output table, the second row shows that c_in (the value from the input file) contains “~~”, and then c has the value with the “~~” replaced by a line feed. The c_hex column shows the hexadecimal representation of c, and you can see the 0A in the middle. After the file has been preprocessed, you can read it with whatever tool you want and make any updates that you need.

Chapter 3: The Operating System

Introduction

Checking the Operating System

Running Operating System Commands

Using the X Statement

Using the SYSTEM() Function

Using a FILENAME Pipe

Working with the File System

Creating and Deleting Directories

Working with Files

Reading and Writing External Files

Using the $VARYING. Format and Informat

Using the FILEVAR= Option

Reading Date and Time Values

Creating a CSV File from a Data Table

Reading a CSV File with Embedded Line Feeds