Detecting outliers and treatments

First, a word of caution: one person's waste might be another person's treasure, and this is true for outliers. For example, for the week of 2/5/2018 to 2/15/2018, theĀ Dow Jones Industrial Average (DJIA) suffers a huge loss. Cheng and Hum (2018) show that the index travels more than 22,000 points, as shown in the following table:

Weekday
Points

Monday

5,113

Tuesday

5,460

Wednesday

2,886

Thursday

3,369

Friday

5,425

Total

22,253

Table 5.1 Dow Jones industrial average points traveled

If we want to study the relationship between a stock and the DJIA index, the observations might be treated as outliers. However, when studying the topic related to the impact of the market on individual stocks, we should pay special attention to those observations. In other words, those observations should not be treated as outliers.

There are many different definitions of an outlier:

Assume that we have download the weekly S&P500 historical data from Yahoo!Finance at https://finance.yahoo.com/. The ticker symbol for the S&P500 market index is ^GSPC. Assume further that the dataset is saved under c:/temp with a name of ^GSPCweekly.csv. The following R program shows the number of cases satisfying the following condition: n standard deviations from their mean. In the program, we assign n a value of 3:

>  distance<-3 
>  x<-read.csv("c:/temp/^GSPCweekly.csv") 
>  p<-x$Adj.Close 
>  ret<-p[2:n]/p[1:(n-1)]-1 
>  m<-mean(ret) 
>  std<-sd(ret) 
>  ret2<-subset(ret,((ret-m)/std)>distance) 
>  n2<-length(ret2)

It is a good idea to show a few output results:

> head(x,2)
Date Open High Low Close Adj.Close Volume
1 1950-01-02 16.66 17.09 16.66 17.09 17.09 9040000
2 1950-01-09 17.08 17.09 16.65 16.65 16.65 14790000
> m
[1] 0.001628357
> std
[1] 0.02051384
> length(ret)
[1] 3554
> n2
[1] 15

Among 3554 weekly returns, 15 of them could be treated as outliers if defined as at least three standard deviations from the mean. Of course, users could use other ways to define an outlier. How to treat those outliers depends on the research topic. One way is to delete them, but the most important reminder is that researchers should detail their methods of treating outliers.