Chapter 3: The Operating System

Introduction

Checking the Operating System

Running Operating System Commands

Using the X Statement

Using the SYSTEM() Function

Using a FILENAME Pipe

Working with the File System

Creating and Deleting Directories

Working with Files

Reading and Writing External Files

Using the $VARYING. Format and Informat

Using the FILEVAR= Option

Reading Date and Time Values

Creating a CSV File from a Data Table

Reading a CSV File with Embedded Line Feeds

Introduction

In a SAS program, you sometimes need to access the operating system (OS) to run system commands and to access the file system. Reading data from external files is usually a requirement in a SAS application; you have to get the data into tables before you can analyze it. This chapter contains some tips and techniques that will help you navigate the world beyond SAS. Since Windows and Unix are the most popular operating systems with SAS programmers, I am going to focus only on those environments. Many of these programs work on other operating systems; see the relevant SAS Companion guide to work with their idiosyncrasies.

Checking the Operating System

To work with the operating system, you must know what OS the program is running on, especially if you are going to run the same program on multiple operating systems. Table 3.1 shows the automatic macro variables &SYSSCP and &SYSSCPL and their values on the different operating systems.

Table 3.1: The Values of &SYSSCP and &SYSSCPL

Operating System &SYSSCP &SYSSCPL
Windows WIN Version of Windows
Unix Unix processor (for example, LIN X64, SUN 64, AIX 64) Specific Unix environment (for example, HP-UX, SunOS, AIX)

If I am writing a program that might run on any of these systems, I generally check whether the value of &SYSSCP is WIN, and I use an open-ended ELSE for Unix (because the values vary on the basis of the specific Unix processor and environment). You can see an example in Program 3.1. SAS Log 3.1 shows the log when the program was run on Windows (64-bit Windows 7 Pro), and SAS Log 3.2 shows the log when the program was run on Unix (64-bit Linux).

Program 3.1: Checking the Operating System

%macro check_os;

   %if (&sysscp eq WIN) %then

      %put Windows: &sysscpl;

   %else %if (&sysscp eq OS) %then

      %put z/OS: &sysscpl;

   %else

      %put Unix: &sysscp &sysscpl;

%mend check_os;

%check_os;

SAS Log 3.1: Checking the Operating System - Windows

Windows: X64_7PRO

SAS Log 3.2: Checking the Operating System - Unix

Unix: LIN X64 Linux

Running Operating System Commands

There are multiple ways to run an operating system command from SAS.

To run an OS command from a SAS program, you must have the XCMD system option turned on. If you are running SAS in batch or interactively using Display Manager or line mode, XCMD is usually turned on by default. If it is not turned on, you can add it to the SAS config file, or you can specify it on the sas command.

If you are running on a workspace server (for example, from a stored process or from SAS Enterprise Guide), this option is usually set to NOXCMD. Because this is a configuration or invocation option only (it must be specified as SAS is started), the best place to set this option is in SAS Management Console, as follows:

1.   On the Plug-ins tab, expand the Server Manager.

2.   Expand the App Server that you are using (SASApp is the most common one).

3.   Expand the Logical Workspace Server.

4.   Right-click the Workspace Server and choose Properties.

5.   Go to the Options tab.

6.   Click the Advanced Options button.

7.   Go to the Launch Properties tab.

8.   Select the Allow XCMD option.

9.   The workspace server will need to be restarted after setting this option, but once it is set, it will be set for all sessions.

Tip—Windows: You might need to specify the (NO)XWAIT and (NO)XSYNC system options. To return processing to your SAS session when the command finishes, you must have NOXWAIT set. Otherwise, you need to close the Command window manually in order to get your SAS program to continue running. Setting XSYNC makes the OS command finish processing before control returns to SAS. Setting NOXSYNC lets the OS command run in the background while your SAS program continues processing.

Tip—Windows: You can execute multiple commands in a single statement by using an ampersand (&) between the commands: cd c:\temp & mkdir temp2

Tip—Unix: You can execute multiple commands in a single statement by using a semicolon (;) between the commands: cd /tmp; mkdir temp2

Using the X Statement

The X statement and the %SYSEXEC macro statement can be executed from anywhere in a SAS program to run an OS command:

x "mkdir newFolder";

%sysexec "mkdir newFolder";

This is the most efficient method of running a command and can be conditionally executed by macro statements if necessary. The return code from the command is saved in the &SYSRC automatic macro variable: 0 = success, >0 = failure. If you execute multiple commands in a single X statement, you get a negative value in the &SYSRC.

Using the SYSTEM() Function

The SYSTEM() function and the CALL SYSTEM() routine can be used in a DATA step. These are useful if you want to create commands based on the contents of a data table. The SYSTEM() function produces a return code that is 0 for success and >0 for failure. A negative value generally indicates a problem, but it depends on the operating system and the command. The CALL SYSTEM() routine does not produce a retrievable return code, which is why I never use it.

The example in Program 3.2 creates a directory for each person in the sashelp.class table. It also determines which directory to use based on the operating system. Figure 3.1 shows the results on Windows and Figure 3.2 shows the results on Unix.

Program 3.2: Using the SYSTEM() Function to Create Directories

options noxwait;

 

data _null_;

   set sashelp.class;

   if (_n_ eq 1) then

   do;

      if ("&sysscp" eq "WIN") then

         rc = system("cd c:/temp");

      else

         rc = system("cd /tmp");

      if (rc ne 0) then

      do;

         putlog "ERR" "OR: Unable to get to $HOME";

         stop;

      end;

      rc = system("mkdir class");

      rc = system("cd class");

   end;

   call system("mkdir " !! name);

run;

Figure 3.1: Using the SYSTEM() Function to Create Directories—Windows

image

Figure 3.2: Using the SYSTEM Function to Create Directories—Unix

image

Using a FILENAME Pipe

Another way to run an OS command is to use a FILENAME statement with a pipe device, and a DATA step with INFILE and INPUT statements. I use this method quite often because it gives the most control. You can retrieve the results from the command, including error messages.

The code in Program 3.3 works on Windows and Unix to give you a directory listing. The output from the command is read with the DATA step and written back into the log. SAS Log 3.3 was created on Windows, and SAS Log 3.4 was created on Unix.

Program 3.3: Using a FILENAME Pipe to Get a Directory Listing

%macro runPipe;

 

   %if (&sysscp eq WIN) %then

   %do;

      filename os_cmd pipe "cd c:\temp\class & dir";

   %end;

   %else

   %do;

      filename os_cmd pipe "cd /tmp/class; ls -l";

   %end;

 

   data _null_;

      infile os_cmd;

      input;

      putlog _infile_;

   run;

 

   filename os_cmd;

 

%mend;

%runPipe

SAS Log 3.3: Using a FILENAME Pipe to Get a Directory Listing—Windows

NOTE: The infile OS_CMD is:

      Unnamed Pipe Access Device,

      PROCESS=cd c:\temp\class & dir,RECFM=V,

      LRECL=32767

 

 Volume in drive C is Win7x64

 Volume Serial Number is 7217-679F

 

 Directory of c:\temp\class

 

01/01/2017 10:30 AM  <DIR>     .

01/01/2017 10:30 AM  <DIR>     ..

01/01/2017 10:30 AM  <DIR>     Alfred

01/01/2017 10:30 AM  <DIR>     Alice

01/01/2017 10:30 AM  <DIR>     Barbara

01/01/2017 10:30 AM  <DIR>     Carol

01/01/2017 10:30 AM  <DIR>     Henry

01/01/2017 10:30 AM  <DIR>     James

01/01/2017 10:30 AM  <DIR>     Jane

01/01/2017 10:30 AM  <DIR>     Janet

01/01/2017 10:30 AM  <DIR>     Jeffrey

01/01/2017 10:30 AM  <DIR>     John

01/01/2017 10:30 AM  <DIR>     Joyce

01/01/2017 10:30 AM  <DIR>     Judy

01/01/2017 10:30 AM  <DIR>     Louise

01/01/2017 10:30 AM  <DIR>     Mary

01/01/2017 10:30 AM  <DIR>     Philip

01/01/2017 10:30 AM  <DIR>     Robert

01/01/2017 10:30 AM  <DIR>     Ronald

01/01/2017 10:30 AM  <DIR>     Thomas

04/13/2017 10:30 AM  <DIR>     William

        0 File(s)       0 bytes

       21 Dir(s)

NOTE: 28 records were read from the infile OS_CMD.

      The minimum record length was 0.

      The maximum record length was 50.

SAS Log 3.4: Using a FILENAME pipe to Get a Directory Listing—Unix

NOTE: The infile OS_CMD is:

      Pipe command="cd /tmp/class; ls -l"

 

total 76

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Alfred

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Alice

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Barbara

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Carol

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Henry

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 James

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Jane

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Janet

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Jeffrey

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 John

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Joyce

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Judy

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Louise

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Mary

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Philip

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Robert

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Ronald

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 Thomas

drwxr-xr-x. 2 userid group 4096 Jan 1 10:30 William

NOTE: 20 records were read from the infile OS_CMD.

      The minimum record length was 8.

      The maximum record length was 54.

Tip—Windows: To get a listing of just the names of the files and directories without all the extra information, use the dir /b windows command:

  filename os_cmd pipe "dir /b c:\temp";

 

Tip—Unix: To get a listing of just the names of the files and directories without all the extra information, use the ls -1 command (that is the number one, not a lowercase L after the hyphen):

  filename os_cmd pipe "ls -1 /tmp";

Working with the File System

From a DATA step, you might need to create external files or create a directory structure if it doesn’t already exist. Rather than using system commands, you can use SAS DATA step functions to do this.

Most of the following examples were run on a Windows system and would need a little modification to work on a Unix system.

Creating and Deleting Directories

A set of DATA step functions is available for working on directories. You can check to see whether a directory exists (FEXIST() or FILEEXIST()), you can create a new directory (DCREATE()), you can delete a directory (FDELETE()), you can get the members of a directory (DNUM(), dread()) and you can look at information about a directory (DOPTNUM(), DOPTNAME(), DINFO()).

The example in Program 3.4 uses some of these functions to create a directory tree based on values from the sashelp.cars table. It creates a top level cars directory, and then a directory for each origin value. Under each origin, it creates a directory for each type of vehicle. SAS Log 3.5 shows the directories that are created.

Program 3.4: Creating Directories

proc sort data = sashelp.cars

     out = cars

     nodupkey;

   by origin type;

run;

 

data _null_;

   set cars;

   by origin type;

 

   length originDir topDir $200;

   retain originDir topDir;

 

   if (_n_ eq 1) then

   do;

      topDir = dcreate("cars", "c:\");

      if (topDir eq "") then

         topDir = "c:\cars";

      else

         putlog "Created: " topDir;

   end;

 

   if (first.origin) then

   do;

      originDir = "";

      originDir = dcreate(strip(origin), topDir);

      if (originDir eq "") then

        originDir = catx("\", topDir, origin);

      else

        putlog "Created: " originDir;

   end;

 

   typeDir = dcreate(strip(type), originDir);

   if (typeDir eq "") then

      typeDir = catx("\", originDir, type);

   else

      putlog "Created: " typeDir;

 

run;

SAS Log 3.5: Creating Directories

Created: c:\cars

Created: c:\cars\Asia

Created: c:\cars\Asia\Hybrid

Created: c:\cars\Asia\SUV

Created: c:\cars\Asia\Sedan

Created: c:\cars\Asia\Sports

Created: c:\cars\Asia\Truck

Created: c:\cars\Asia\Wagon

Created: c:\cars\Europe

Created: c:\cars\Europe\SUV

Created: c:\cars\Europe\Sedan

Created: c:\cars\Europe\Sports

Created: c:\cars\Europe\Wagon

Created: c:\cars\USA

Created: c:\cars\USA\SUV

Created: c:\cars\USA\Sedan

Created: c:\cars\USA\Sports

Created: c:\cars\USA\Truck

Created: c:\cars\USA\Wagon

Here is something to watch out for with DCREATE(): Note that the originDir is set to blank before the DCREATE(). function executes. This is because if it has a non-blank value and the DCREATE() function does not create the directory, then the originDir does not change. So you can’t test for blank to see whether DCREATE() was successful.

If you want to delete a directory, you can use the FDELETE() function. However, the fdelete() can only be used to delete an empty directory, so you must first delete all the member directories and files. The DNUM() function (get the number of members) and the DREAD() function (get the member names) can be used to identify all the members to delete. As an example, the code in Program 3.5 deletes the cars directory structure that was created by Program 3.4. SAS Log 3.6 shows the directories as they are deleted.

Program 3.5: Deleting Directories

data _null_;

   carspath = "c:\cars";

 

   if (fileexist(carsPath));  

 

   rc = filename("cars", carsPath);  

   cars_did = dopen("cars");

 

   do c = 1 to dnum(cars_did);

 

      origin = dread(cars_did, c);  

      originPath = catx("\", carsPath, origin);

      rc = filename("origin", originPath);

      origin_did = dopen("origin");

 

      do o = 1 to dnum(origin_did);  

         type = dread(origin_did, o);  

         typePath = catx("\", originPath, type);

         rc = filename("type", typePath);

         rc = fdelete("type");

         rc = filename("type");

         putlog "Deleted: " typePath;

      end; /* do o – loop through origin directories */

 

      origin_did = dclose(origin_did);  

      rc = fdelete("origin");

      rc = filename("origin");

      putlog "Deleted: " originPath;

 

   end; /* do c – loop through cars directory */

 

   cars_did = dclose(cars_did);  

   rc = fdelete("cars");

   rc = filename("cars");

   putlog "Deleted: " carsPath;

 

run;

   Process only if the cars directory exists.

   Open the cars directory.

   Loop through the members of the cars directory (the origin directories).

   Open the origin directory.

   Loop through the members of the origin directory (the type directories).

   Open the type directory and delete it.

   Delete the origin directory.

   Delete the cars directory.

SAS Log 3.6: Deleting Directories

Deleted: c:\cars\Asia\Hybrid

Deleted: c:\cars\Asia\Sedan

Deleted: c:\cars\Asia\Sports

Deleted: c:\cars\Asia\SUV

Deleted: c:\cars\Asia\Truck

Deleted: c:\cars\Asia\Wagon

Deleted: c:\cars\Asia

Deleted: c:\cars\Europe\Sedan

Deleted: c:\cars\Europe\Sports

Deleted: c:\cars\Europe\SUV

Deleted: c:\cars\Europe\Wagon

Deleted: c:\cars\Europe

Deleted: c:\cars\USA\Sedan

Deleted: c:\cars\USA\Sports

Deleted: c:\cars\USA\SUV

Deleted: c:\cars\USA\Truck

Deleted: c:\cars\USA\Wagon

Deleted: c:\cars\USA

Deleted: c:\cars

See Also: The code in this example presumes that you know the exact depth of your directory structure. The Chapter 7 section headed “Simulating Recursion” contains an example  showing how to traverse a directory structure when you don’t know what is in the directory.

Working with Files

A set of functions is available for reading and writing external files (for example,FPUT(), FWRITE(), FGET(), FREAD()). I find using these functions to be a much more difficult way to read or write an external file than to use the FILE and INFILE statements with the PUT and INPUT statements. If you need to read or write an external file from outside a DATA step using a macro, then these file functions with %SYSFUNC work. From within a DATA step, I haven’t found much use for these functions.

There are some other functions that I find much more useful for manipulating files. You can delete files (FDELETE()), copy files (FCOPY()), look at file information (FOPTNUM(), FOPTNAME(), FINFO()), and check to see if a file exists (FEXIST(), FILEEXIST()).

The example in Program 3.6 and SAS Log 3.7 uses these file functions to see whether a file exists, to copy one file to another, to create a file and write to it, to look at the file information, and finally to delete the file.

Program 3.6: Working with External Files

data _null_;

 

   oldFile = "c:\old.txt";

   newFile = "c:\new.txt";

   rc = filename("new", "newFile");

 

   if (fileexist(oldFile) and not fexist("new")) then

   do;

      rc = filename("old", oldFile);

      rc = fcopy("old", "new");  

      rc = filename("old");

      putlog "Copied old file (" oldFile ") to "

             "new file (" newFile +(-1) ")";

   end; /* if – old file exists, new file doesn’t exist */

 

   else if (not fexist("new")) then

   do;

      fid = fopen("new", "O");  

      rc = fput(fid, "Testing");

      rc = fwrite(fid);

      rc = fclose(fid);

      putlog "Created new file (" newFile +(-1) ")";

   end; /* else if – old file and new file don’t exist */

 

   fid = fopen("new");     

   do i = 1 to foptnum(fid);

      name = foptname(fid, i);

      value = finfo(fid, name);

      putlog name= value=;

   end; /* do i – loop through file info */

   fid = fclose(fid);

 

   rc = fdelete("new");  

   putlog "Deleted new file (" newFile +(-1) ")";

 

run;

   Use the FILENAME() function to set a fileref pointing to the new file (which does not exist).

   Use the FILEEXIST() function to see whether the old file exists, and use the FEXIST() function to see whether the new file exists. The difference between them is that FILEEXIST() accepts a path, while FEXIST() accepts a fileref. In this example, the old.txt file did not exist, so this IF statement was not true.

   Copy the old.txt file to the new.txt file.

   Check to see whether the new.txt file does not exist.

   Use FPUT() and FWRITE() to write to the new.txt file.

   Open the new.txt file and write all the available information about the file to the SAS log.

   Delete the new.txt file.

SAS Log 3.7: Working with External Files

Created new file (c:\new.txt)

name=Filename value=C:\Users\sasmih\newFile

name=RECFM value=V

name=LRECL value=32767

name=File Size (bytes) value=9

name=Last Modified value=04Feb2017:16:54:15

name=Create Time value=04Feb2017:16:54:15

Deleted new file (c:\new.txt)

Note: For simplicity in the file and directory function examples, I have left out the return code checking. To avoid errors and unexpected results, you should check for non-zero return codes from these functions. If the function fails, you can use the SYSMSG() function to capture the error message.

Reading and Writing External Files

If you are reading from and writing to an external file using the INFILE/INPUT and FILE/PUT statements, a couple of features can make things easier.

Using the $VARYING. Format and Informat

The $VARYING. format is a great way to write values that aren’t always the same length. The syntax for this format requires that you specify the actual length of the value after $VARYING:

len = 3;

put var $varying. len;

The example in Program 3.7 shows the difference between using a regular $CHAR. format and $VARYING. format.

Program 3.7: Comparison of $CHAR. to $VARYING.

data _null_;

   set sashelp.class;

   file "c:\class.txt";

   put "$char   " name $char.;

   len = length(name);

   put "$varying " name $varying. len;

run;

Figure 3.3 shows a listing of the output file that is created. In the output file, I have selected all the text. You can see that when using the $CHAR. format, the line is padded with blanks, while the $VARYING. line is not. In this example, it doesn’t make too much difference, but if you are creating a large file with long, varying lines of text, the extra spaces make the file even larger.

Figure 3.3: Comparison of $CHAR. to $VARYING.

image

I use the $VARYING. format when I am generating code with a DATA step. It allows me to have some very long lines as well as very short lines, so that when viewed in another editor that has wrapping turned on, only the truly long lines wrap, not the short lines that are padded with blanks.

You can also use the $VARYING. informat when reading from a file. When you do this, you can use the RECLEN= option on the INFILE statement. This option creates an automatic variable that contains the length of the incoming line. You can then use that value to determine the length for the $VARYING. informat. Reading the data this way ensures that you don’t read past the end of the input line. However, I think using the TRUNCOVER option on the INFILE statement is the easiest way to handle reading past the end of the line, so I rarely use the $VARYING. informat.

Using the FILEVAR= Option

The FILEVAR= option can be used on an INFILE statement or on a FILE statement. It lets you change the input file or output file during the execution of a DATA step without having multiple INFILE or FILE statements. It lets you write (using FILE) and read (using INFILE) multiple files in a single DATA step. So instead of creating a separate DATA step to read each file, and then appending the contents together, you can just use one DATA step.

The example in Program 3.8 and SAS Log 3.8 creates three files in a directory and writes to each of them in a single DATA step.

Program 3.8: Using FILEVAR= with FILE to Create Files

data _null_;

 

   dirName = "c:\filevar\";

   if (not fileexist(dirName)) then

      dirName = dcreate("filevar", "c:\");

 

   rc = filename("fvar", dirName);

 

   do f = 1 to 3;

 

      fname = cats(dirName, "file", f, ".txt");

      file fff filevar = fname;

      do i = 1 to f;

         put "Line " i +2 "file" f +(-1)".txt";

      end; /* do i – loop through records in file */

 

   end; /* do f – loop through files to be created */

 

   rc = filename("fvar");

 

run;

SAS Log 3.8: Using FILEVAR= with FILE to Create Files

NOTE: The file FFF is:

      Filename=c:\filevar\file1.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=0,

 

NOTE: The file FFF is:

      Filename=c:\filevar\file2.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=0,

 

NOTE: The file FFF is:

      Filename=c:\filevar\file3.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=0,

 

NOTE: 1 record was written to the file FFF.

      The minimum record length was 18.

      The maximum record length was 18.

NOTE: 2 records were written to the file FFF.

      The minimum record length was 18.

      The maximum record length was 18.

NOTE: 3 records were written to the file FFF.

      The minimum record length was 18.

      The maximum record length was 18.

After Program 3.8 creates the files, Program 3.9 reads all the files. SAS Log 3.9 shows the results of reading the files.

Program 3.9: Using FILEVAR= with INFILE to Read Files

data _null_;

 

   if (fileexist("c:\filevar"));

 

   rc = filename("fvar", "c:\filevar");

   did = dopen("fvar");

 

   do f = 1 to dnum(did);

 

      fname = cats("c:\filevar\", dread(did, f));

      infile fff filevar = fname truncover end = eof;

 

      do until(eof);

         input @6 lineNum @9 filename $10.;

         put lineNum= filename=;

      end; /* do until – loop through the infile records */

 

   end; /* do f - loop through files in directory */

 

   did = dclose(did);

   rc = filename("fvar");

   stop;

 

run;

SAS Log 3.9: Using FILEVAR= with INFILE to Read Files

NOTE: The infile FFF is:

      Filename=c:\filevar\file1.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=20,

 

lineNum=1 filename=file1.txt

NOTE: The infile FFF is:

      Filename=c:\filevar\file2.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=40,

 

lineNum=1 filename=file2.txt

lineNum=2 filename=file2.txt

NOTE: The infile FFF is:

      Filename=c:\filevar\file3.txt,

      RECFM=V,LRECL=32767,File Size (bytes)=60,

 

lineNum=1 filename=file3.txt

lineNum=2 filename=file3.txt

lineNum=3 filename=file3.txt

NOTE: 1 record was read from the infile FFF.

      The minimum record length was 18.

      The maximum record length was 18.

NOTE: 2 records were read from the infile FFF.

      The minimum record length was 18.

      The maximum record length was 18.

NOTE: 3 records were read from the infile FFF.

      The minimum record length was 18.

      The maximum record length was 18.

Warning: Do not forget the STOP statement at the end. If you leave it off, the DATA step goes into an infinite loop that will drive you crazy trying to debug. If you get into an infinite loop, always remember to check that you have the STOP statement.

Reading Date and Time Values

The ANYDTDTE. (for dates), ANYDTDTM. (for datetimes), and ANYDTTME. (for times) informats are incredibly useful when reading in data that has date and time values. You can use these informats to read many date and time formats, instead of having to find the informat that corresponds to your data. In addition, if you have a date column that has values in different formats (for example, mm/dd/yyyy and ddMONyy), or you don’t know what format the data will be in, you can read them all with the ANYDTDTE. informat.

Program 3.10 and SAS Log 3.10 show how to use the ANYDT* informats.

Program 3.10: Using ANYDT* Informats to Read Date and Time Values

data _null_;

   format date date9.

          time time5.

          datetime datetime18.;

   input @1 date anydtdte15.

         @17 time anydttme13.

         @31 datetime anydtdtm21.;

   putlog date= time= datetime=;

datalines;

01JAN17     2:00pm    01Jan17:14:00

1/1/2017    14:00:00   170101 02:00 pm

January 1, 2017 14:00     1/1/2017 14:00

17001      02:00:00pm  01JAN2017

01Jan17:14:00  01Jan17:14:00 01Jan17:14:00

;

SAS Log 3.10: Using ANYDT* Informats to Read Date and Time Values

date=01JAN2017 time=14:00 datetime=01JAN17:14:00:00

date=01JAN2017 time=14:00 datetime=01JAN17:14:00:00

date=01JAN2017 time=14:00 datetime=01JAN17:14:00:00

date=01JAN2017 time=14:00 datetime=01JAN17:00:00:00

date=01JAN2017 time=14:00 datetime=01JAN17:14:00:00

Creating a CSV File from a Data Table

There are several ways of creating a comma-separated values (CSV) file from a data table, but one of the simplest might be one that you haven’t heard of: %DS2CSV. This macro utility is part of Base SAS, and all you need to do is tell it the name of the data table and the name of the CSV file, and run it. There are lots of documented options, including options that can be used if you are running from a web application and want to allow the user to download the CSV file. Program 3.11 and SAS Log 3.11 show a simple usage of %DS2CSV to create a CSV file from the sashelp.class table.

Program 3.11: Using %DS2CSV to Create a CSV File

%ds2csv(data = sashelp.class,

        csvfile = c:\temp\class.csv,

        runmode = b);

SAS Log 3.11: Using %DS2CSV to Create a CSV File

NOTE: CSV file successfully generated for SASHELP.CLASS.

You need to set the RUNMODE= option to b if you are running the code in batch or interactive SAS; otherwise, the macro will attempt to run in a SAS workspace server.

PROC EXPORT is another easy way to create a CSV file. It has plenty of documented options that will enable you to create the CSV the way you want it. It also generates a SAS DATA step that you can save and modify if the code isn’t quite what you want. Program 3.12 and SAS Log 3.12 show how to use PROC EXPORT to create a CSV file from the sashelp.class table.

Program 3.12: Using PROC EXPORT to Create a CSV File

proc export data = sashelp.class

            outfile = "c:\temp\class.csv"

            dbms = csv

            label

            replace;

run;

SAS Log 3.12: Using PROC EXPORT to Create a CSV File

7   /*************************************************************

8    *  PRODUCT:  SAS

9    *  VERSION:  9.4

10   *  CREATOR:  External File Interface

11   *  DATE:   05MAY17

12   *  DESC:   Generated SAS Datastep Code

13   *  TEMPLATE SOURCE: (None Specified.)

14   **************************************************************/

15      data _null_;

16      %let _EFIERR_ = 0; /* set the ERROR detection variable */

17      %let _EFIREC_ = 0; /* clear export record count variable */

18      file 'c:\temp\class.csv' delimiter=',' DSD DROPOVER lrecl=32767;

19      if _n_ = 1 then  /* write column names or labels */

20       do;

21         put

22           '"' "Name" '"'

23         ','

24           '"' "Sex" '"'

25         ','

26           '"' "Age" '"'

27         ','

28           '"' "Height" '"'

29         ','

30           '"' "Weight" '"'

31         ;

32       end;

33     set SASHELP.CLASS  end=EFIEOD;

34         format Name $8. ;

35         format Sex $1. ;

36         format Age best12. ;

37         format Height best12. ;

38         format Weight best12. ;

39       do;

40         EFIOUT + 1;

41         put Name $ @;

42         put Sex $ @;

43         put Age @;

44         put Height @;

45         put Weight ;

46         ;

47       end;

48      if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */

49      if EFIEOD then call symputx('_EFIREC_',EFIOUT);

50      run;

 

NOTE: The file 'c:\temp\class.csv' is:

      Filename=c:\temp\class.csv,

      RECFM=V,LRECL=32767,File Size (bytes)=0,

      Last Modified=06May2017:08:35:33,

      Create Time=17Apr2017:11:31:58

 

NOTE: 20 records were written to the file 'c:\temp\class.csv'.

      The minimum record length was 17.

      The maximum record length was 36.

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: DATA statement used (Total process time):

      real time      0.02 seconds

      cpu time      0.03 seconds

 

 

19 records created in c:\temp\class.csv from SASHELP.CLASS.

 

NOTE: "c:\temp\class.csv" file was successfully created.

NOTE: PROCEDURE EXPORT used (Total process time):

      real time      0.16 seconds

      cpu time      0.09 seconds

Reading a CSV File with Embedded Line Feeds

A common issue when you are working in a Microsoft Excel spreadsheet is using Alt+Enter to go to a new line within the same cell thus creating an embedded line feed character. If you then turn this Excel spreadsheet into a CSV, the value contains the embedded line feed. If you view the file in some basic text editors, you will see that it goes to a new line at this point. If you try to read this file with SAS software, it will see the line feed character as an end of record, will set the rest of the fields to blank, and will go to the next iteration of the DATA step and start reading from the middle of the line.

If you are using a DATA step with an INFILE statement and INPUT statements, then this is an easy fix. You can add the TERMSTR= option to the INFILE statement and it will handle the line feed character correctly.

For this example, I used the code in Program 3.13 to create a CSV file that has an embedded line feed character (0Ax).

Program 3.13: Creating a CSV with an Embedded Line Feed

data _null_;

   file "c:\temp\test.csv";

   put '"aaa","bbb","ccc","ddd"';

   put '"aa","bb","c' '0A'x 'c","dd"';

   put '"a","b","c","d"';

run;

If you look at the file in an editor, you will see that the second record is split over two lines. Figure 3.4 shows the file in Notepad.

Figure 3.4: Creating a CSV with an Embedded Line Feed

image

Program 3.14 and Figure 3.5 show what happens when you attempt to read this file with a DATA step and INFILE statement.

Program 3.14: Attempting to Read a CSV File with Standard INFILE/INPUT

data test;

   infile "c:\temp\test.csv" dlm = "," dsd truncover;

   length a b c d $5;

   input a b c d;

run;

Figure 3.5: Attempting to Read a CSV File with Standard INFILE/INPUT

image

This table is definitely not what you wanted. So, you can add the TERMSTR=CRLF option to the INFILE statement. This tells the INFILE statement to only use a carriage return + line feed as the end of record. Program 3.15 and Figure 3.6 show what happens when you use the TERMSTR=CRLF option.

Program 3.15: Reading a CSV File with the TERMSTR= Option

data test;

   infile "c:\temp\test.csv" dlm = "," dsd truncover

                             termstr = CRLF;

   length a b c d $5;

   input a b c d;

run;

Figure 3.6: Reading a CSV File with the TERMSTR= Option

image

You can see that the output table is now correct.

However, if you want to read the CSV file with PROC IMPORT, or you actually have an embedded carriage return (0Dx) + line feed (0Ax), not just a line feed character in the file, the only way to handle it is to preprocess the file and change the line feed characters to another character. Additionally, I usually want to maintain the line feeds in my data, so I convert each line feed character to a value that I will be able to find again. After the data has been read, I can turn those values back into line feeds. If you don’t care about maintaining the line feed, then you can just convert them to blanks.

Program 3.16 shows the code that preprocesses the file and converts any line feed that is part of a string that is enclosed in double quotation marks into “~~”. Figure 3.7 shows the preprocessed file in Notepad.

Program 3.16: Preprocessing a CSV File and Replacing Line Feeds

data _null_;

   infile "c:\temp\test.csv" recfm=n sharebuffers;

   file "c:\temp\test_TEMP.csv" recfm=n;

   input a $char1.;

   retain inQuotes 0;

   if a = '"' then

      inQuotes = (not inQuotes);

   if (a eq '0A'x and inQuotes) then

      put '~~';

   else

      put a $char.;

run;

Figure 3.7: Preprocessing a CSV File and Replacing Line Feeds—test_TEMP.csv

image

Now Program 3.17 can read the test_TEMP.csv file into a SAS table and put the line feeds back where they should be. Figure 3.8 shows the data table that is created from the temporary CSV file.

Program 3.17: Reading a Preprocessed CSV File

data test;

   infile "c:\temp\test_TEMP.csv" dlm = "," dsd truncover;

   length a b c_in c c_hex d $10;

   input a b c_in d;

   c = tranwrd(c_in, "~~", "0A"x);

   c_hex = put(c, hex6.);

run;

Figure 3.8: Reading a Preprocessed CSV File

image

As you can see from the output table, the second row shows that c_in (the value from the input file) contains “~~”, and then c has the value with the “~~” replaced by a line feed. The c_hex column shows the hexadecimal representation of c, and you can see the 0A in the middle. After the file has been preprocessed, you can read it with whatever tool you want and make any updates that you need.