Outlier Detection
By Ou Zhang
November 19, 2020
This repository contains all the useful resources (personal opinion) I have found during my outlier detection study and research.
I’ve spent a lot of words to discuss outlier detection philosophy and methods in my blogs. You can find these 4 blog posts below.
 The philosophy of outliers
 Outlier detection univariate methods
 Outliers detection in regression
 Outlier detection multivariate methods
Besides my blog articles, I put my technical notes in the ‘Notes’ folder for your information. In the meantime, all the relevant online sources and useful links are saved in the EXCEL file. It includes multiple useful handouts and some valuable papers. You can find them in the ‘Handout’ folder.
Among all the materials, William G. Jacoby’s handout is worthy of special mention.
A lot useful ‘Outlier detection’ R packages are available.
 outliers is useful for the univariate outlier detection. it contains multiple statistical tests (i.e, ‘grubbs’, ‘dixon’).

EnvStats has ‘rosner test’ (
rosnerTest
). 
car is super useful and it has a lot of wonderful functions for the outlier detection.
The
outlierTest()
function from the {car} package gives the most extreme observation based on the given model and allows to test whether it is an outlier. In addition,car
package provides a series of graphing functions to plot outliers throughresidualPlots
,avPlots
,qqPlot
,influenceIndexPlot
. Among all these useful plot functions, the functioninfluencePlot
deserves special mention. This function creates a bubble plot of Studentized residuals versus hat values, with the areas of the circles representing the observations proportional to the value Cook’s distance. Vertical reference lines are drawn at twice and three times the average hat value, horizontal reference lines at 2, 0, and 2 on the Studentizedresidual scale. [xaxis: Hatvalue (with cutoffs), Yaxis: studentized residual, size of bubble (Cook’s D).]  mvoutlier includes a variety of functions for the multivariate outlier detection.

DMwR has a useful function
lofactor()
which obtains local outlier factors using the LOF algorithm. 
robustbase provides a higher level of multivariate outlider criteria calculation. For example, the function
covMcd
calculates Robust Location and Scatter Estimation via MCD. 
performance offers one of the most multivariate outlier detection function
check_outliers
. With different option keywords, this function is able to cover most of multivariate outlier detection criteria. You can find more details through the link below. check_outliers
In addition, I’ve listed all 9 useful R example scripts. These R scripts are great practice resources for you to understand the outlier detection process and some available methods. You can download them and practice on your local computer.