Life is full of surprises. Our goal is to make a distinction between them and “normal” behavior. That is called Anomaly Detection. In fact, anomalies are most interesting things in Data Analysis. And it is always good to have a set of handy tools for that at hand. Here is my toolkit.

## AnomalyDetection R package

Twitter’s AnomalyDetection is a popular and simple in use R package for time series anomaly analysis. The package uses a Seasonal Hybrid ESD (Extreme Studentized Deviate test) algorithm to identify local and global anomalies.

As an outcome of its work, we can get a `data.frame`

with anomalous observations, and, if necessary, a plot with both the time series and the estimated anoms, indicated by circles:

## Outlier in psych R package

Dealing with multidimensional numeric or logical data, we can detect outliers, calculating Mahalanobis distance for each data point and then compare these to the expected values of *Χ ^{2}*. We can do it with the

`outlier`

function of the psych R package:D2 <- outlier(dat, plot=TRUE, bad=5)

Looking at the Q-Q plot below, we can set a threshold for *D2* to identify outliers, let’s say, above 18:

In other words, any observations, which Mahalanobis distances are above the threshold, can be considered as outliers.

## Time Series Anomaly Detection in Azure ML

I like Microsoft Azure Machine Learning Studio. It contains a really powerful module for Time Series Anomaly Detection. It can measure:

- the magnitude of upward and downward changes
- direction and duration of trends: positive vs. negative changes

The module learns the pattern from the data, and adds two columns (*Anomaly score* and *Alert indicator*) to indicate values that are potentially anomalous:

## One-Class Support Machine in Azure ML

This Azure ML module can be used when we have a lot of data, labeled as “normal” and not too many anomalous instances. One-class SVM learns a discriminative boundary around normal instances, and everything out of the boundary is considered as anomalous. Our responsibility is to tune model parameters and train it.

Running the experiment does the scoring of the data. The scored output adds two more columns to the dataset: *Scored Labels* and *Score Probabilities*. The *Score Label* is a 1 or a 0, where a 1 is representing an outlier:

## PCA-Based Anomaly Detection in Azure ML

Like in case of *One-class SVM*, PCA-Based Anomaly Detection model is trained on normal data. The Scored dataset contains *Scored Labels* and *Score Probabilities*. But mind you that for the PCA-based model, the *Scored Label* 1 means normal data:

## rxOneClassSvm in R

If we cannot use Cloud-based solutions (and Azure ML respectively) for some reasons, we can use rxOneClassSvm function, included into MicrosoftML R package. MicrosoftML is a package for Microsoft R Server, Microsoft R Client, and SQL Server Machine Learning Services.

The training set contains only examples from the normal class. In order to train a model we have to specify an R formula:

svmModel <- rxOneClassSvm( formula = ~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = trainIris)

Scoring results include a variable `Score`

:

scoreDF <- rxPredict(svmModel, data = testIris, extraVarsToWrite = "isIris")

tail(scoreDF) isIris Score 57 1 -0.3131609 58 1 -0.3095322 59 1 -0.1532502 60 1 -0.3937540 61 0 0.5537572 62 0 0.4861979

R documentation asserts:

“This algorithm will not attempt to load the entire dataset into memory.”

Hmm, quite a useful feature indeed!

## What else?

In fact, there are much more packages for anomaly detection. We can use any binary or multi-class classifiers, cluster analysis, neural networks, kNN and many others. But this is my First Aid Kit.

Teodoro StilleAugust 2, 2018 / 10:09 pmWith thanks! Valuable information!

LikeLike

WallaceJune 19, 2020 / 6:39 pmHi, after reading this remarkable post i am too glad to share my experience here with mates.|

LikeLike