The general form of the DELETE statement is as follows:
DELETE FROM table – name
<WHERE expression>;
Here, we have the following:
- The table name is the table whose records need to be deleted
- WHERE is an optional statement if only certain records from the table need to be deleted
We will try to subset the data before deleting it. Let's delete all the records that were created due to the looping exercise on the dealership data.
Run the following code:
Proc Sql;
Delete From Dealership_looped;
Where i gt 1;
Quit;
Did you notice anything wrong in the preceding syntax? The semicolon is in the wrong place. Due to the semicolon before the WHERE clause, all the records will be deleted from the table. SAS won't throw an error, but after deleting the records, it will put a NOTE in the LOG stating that the WHERE clause isn't supported in this manner in SAS. Therefore, be careful and don't put a semicolon after the dataset name unless you don't have a WHERE condition.
Let's remove the semicolon after the dataset name. We will get the following LOG message:
NOTE: 35999964 rows were deleted from WORK.DEALERSHIP_LOOPED.
NOTE: PROCEDURE SQL used (Total process time):
real time 2:32.45
cpu time 2:29.17
The same DELETE statement can also help delete the records from the underlying table specified in the view.
There is an alternate way to delete the records using the built-in procedure. Let's compare the time it takes to complete the task using PROC DELETE. The Proc SQL delete syntax was run without the WHERE condition so that the time that was taken was specified:
NOTE: 36000000 rows were deleted from WORK.DEALERSHIP_LOOPED.
75 Quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:31.46
cpu time 1:29.89
Deleting the dataset using PROC DELETE only took 0.16 seconds, which is considerably lower than the 1:31:46 that we had by using Proc SQL without the WHERE condition:
NOTE: Deleting WORK.DEALERSHIP_LOOPED (memtype=DATA).
NOTE: PROCEDURE DELETE used (Total process time):
real time 0.16 seconds
cpu time 0.14 seconds
The code that we used for PROC DELETE was as follows:
Proc Delete Data=Dealership_looped;
Run;
We can delete multiple datasets using the preceding procedure. The dataset names can be specified as follows:
Data = Lib1.A Lib2.B Lib.C (genum = all);
Specifying the genum option deletes all historical versions of the datasets.
We can also delete the datasets using the DATASETS procedure:
Proc Datasets Library=WORK;
Delete Dealership_Looped;
Run;
The preceding code also took approximately the same time as using the DELETE procedure.
Note that the PROC DELETE option doesn't allow for a conditional delete. A warning is produced in the following LOG message to that effect. However, what isn't stated in the LOG is that the dealership_looped dataset has already been deleted:
73 Proc Delete Data=dealership_looped;
74 Where i gt 1;
WARNING: No data sets qualify for WHERE processing.
If you are trying to delete a dataset, I recommend going for the built-in DELETE or DATASETS procedures. However, if a subset of the data needs to be deleted, try using Proc SQL with the WHERE option.