# Statistics

## Outliers-Part 4:Finding Outliers in a multivariated way

1 Data Source 1.1 Variables in Data 2 Model-specific methods 2.1 Cook’s Distance 2.2 Pareto 3 Multivariate methods 3.1 Mahalanobis Distance 3.1.1 Details about Mahalanobis Distance 3.2 Robust Mahalanobis Distance 3.3 Minimum Covariance Determinant (MCD) 3.3.1 robust tolerance ellipsoid (RTE) 3.4 Invariant Coordinate Selection (ICS) 3.5 OPTICS 3.6 Isolation Forest 3.7 Local Outlier Factor 4 ‘check_outliers’ function in {performance} R package 4.0.1 Threshold specification 5 Reference Figure 0.

## Outliers-Part 3:Outliers in Regression

1 Types of Unusual Observations 1.1 Regression Outliers 1.2 Leverage 1.3 Influential Observations 1.4 Good vs. Bad Leverage 2 Detecting Influential Observations 2.1 Graphic diagnostics 2.1.1 A scatter plot with Confidence Ellipse 2.1.2 Quantile Comparison Plots (QQ-Plot) 2.1.2.1 Rule of Thumb 2.1.3 Added-variable plots 2.2 Numerical diagnostics 2.2.1 Hat Matrix 2.2.1.1 Rule of Thumb 2.2.2 Standardized Residuals 2.2.2.1 Rule of Thumb 2.2.3 Studentized Residuals 2.

## Outliers-Part 2:Finding Outliers in a univariated way

1 Method 1: Sorting Your Datasheet to Find Outliers 2 Method 2: Graphing Your Data to Identify Outliers 2.1 Histogram 2.2 Boxplot 2.2.1 Adjusted boxplot (Hubert and Vandervieren, 2008) 3 Method 3: Using Z-scores to Detect Outliers 3.1 Z-Score pros: 3.2 Z-Score cons: 4 Method 4: Using the Interquartile Range (IRQ) to Create Outlier Fences 5 Method 5: Percentiles 5.1 scores function from {outliers} packages 6 Method 6: Hampel filter 7 Method 7: Finding Outliers with Hypothesis Tests 7.

## Outliers-Part 1:Causes, Philosophy and General Rules

1 What are Outliers? 2 Causes for Outliers 3 Types of Outliers 4 Philosophy about Finding Outliers 5 General Rules Figure 0.1: Outliers 4 years ago (Yes, back to 2016), I was asked by a director of data science department from a very famous IT company about outliers. Basically, she asked two questions: What are outliers? How to detect them? Also in my daily research life, I have encountered noisy data all the time.

## Utilizing a Logistic Regression Model to Predict Credit Card Default

1 Data 2 Income, Balance & Default 3 Model Selection 4 Diagnosis 5 Interesting Points 6 Model Cross-Validation 7 Parameter Selection 8 Conclusion Logistic regression model is widely used for group classification. In education or social science, it has been used to classify students/individuals to different groups. In the finance industry, logistic regression model is also quite useful to identify/classify individual’s group status (i.e. Y) according his/her other features (i.