Anomaly detection
- In this project, I implemented an algortihm which aim was to detect anomalous behaviour in server computers. This was done by taking into account latency (in ms) and throughput (mb/s) of each server response. A Gaussian model was utilised in order to detect anomalous examples in this dataset.
The dataset consisted of unlabeled examples expressing servers behaviour while they were operating. We expected that most of the examples correspond to normal (non-anomalous) values, whereas some of them perform in an anomalous behaviour.
As a first step, I visualised the dataset in order to get a first idea about the distribution of the servers examples.
We can see that the vast majority of the examples are plotted in the same area, apart from 6-7 points which are far away from this cluster. The next step was to fit a Gaussian distribution to this dataset. This fit was combined with the initial visualisation in Figure 2.
Afterwards, I calculated the probabilities of each example and selected a threshond under which an example was considered as outlier. Finally, the detected anomalies were found and are represented with red cirles in Figure 3.
The last part of this project was to test this algorithm in a more complex dataset.
A more detailed description of this project’s implentation in Matlab can be seen in this github repository: Link to Github repository