Post

Anomaly detection

In this project, I implemented an algortihm which aim was to detect anomalous behaviour in server computers. This was done by taking into account latency (in ms) and throughput (mb/s) of each server response. A Gaussian model was utilised in order to detect anomalous examples in this dataset. The dataset consisted of unlabeled examples expressing servers behaviour while they were operating. We expected that most of the examples correspond to normal (non-anomalous) values, whereas some of them perform in an anomalous behaviour.

Post

Collaborating Filtering

The aim of this project was to utilise collaborating filtering in order to create recommender system for movies ratings. First, I loaded a dataset which consists of movies ratings on a scale of 1 to 5. These ratings are illustrated in Figure 1. Figure 1: Movies ratings Based on these ratings, several statistics can be calculated, such as average rating of random movie. Figure 2: Movie average rating As a next step, several new movies ratings were imported.

Post

Principal Component Analysis

In the context of this project, I applied Principal Component Analysis (PCA) in order to create a model that reduces data dimensions and resises images. Initially, PCA was tested in a 2D dataset, which is shown in Figure1. Figure 1: 2-dimensional dataset The first step is to compute the principal components of the input data. These components can be seen in Figure 2. Figure 2: Principal components of 2D dataset Now, input 2D dataset can be projected to the principal components in one dimension, as Figure 3 illustrates.

Post

Support Vector Machines

The goal of this project is to create a model that classifies various types of data. Support Vector Machines consist a well-established technique of machine learning applied in classification tasks. The first dataset can be seen in Figure 1. Figure 1: Input dataset 1 Data of two classes can be seperated by a linear decision boundary. Thus, by aplying SVM, data points are linearly seperated as you can see in Figure 2.

Post

Regularised Logistic Regression

In this project, I predicted whether microchips from a fabrication plant pass quality assurance process (QA). During QA, each microchip goes through various tests to ensure it functions correctly. Such problems belong to classification machine learning approaches. I had the test results for some microchips on two different tests results on past microchips. A part of the utilised microchips tests as well as the label for each test can be shown in Figure 1.

Post

K-means clustering

The main purpose of this project was to compress an image via the clustering technique K-means. This was done by taking into account only colours that occur the most in an image. K-means is a known unsupervised machine learning technique for clustering similar data. The algorithm was first tested in a 2-dimensional dataset, which is visualised in Figure 1. Figure 1: 2-dimensional dataset After that, K-means clustering is applied and after 10 iterations, all data points are assigned to three centroids.

Post

Neural Networks Learning

The aim of this project was to utilise Neural Networks in order to create a model that predicts handwritten digits from images. The dimensions of the images used in this project were 40x40 pixels and images contain digits from 0 to 9. A small sample of this dataset is illustrated below. Figure 1: First 100 digits of dataset A typical structure of Neural Networks is shown in Figure 2, where information of each images is propagated through units and layers.

Post

Logistic Regression

In this project I utilised Logistic Regression in order to create a model that predicts handwritten digits from images. The dimensions of the images used in this project were 40x40 pixels and images contain digits from 0 to 9. A small sample of this dataset is illustrated below. Figure 1: Input Dataset Since there are 10 different digits, a multi-class classifier must be trained. That means that 10 different logistic regression models have to be trained.

Post

Regularised Linear Regression

The goal of this project was to create a model that predicts the amount of water flowing out of a dam. This was done based on the changes of water level in a reservoir. Visualising input dataset always helps to understand the geometry of data points. Figure 1: Input Dataset As a first step, I applied linear fit, which is shown in Figure 2. Nevertheless, we see that the fit is not a good match due to the non-linear pattern of data.

Post

Logistic Regression

This project intented to create a model that predicts whether a university applicant is going to be accepted or not. This was done by taking into account his/her exams results. In cases like this one, classification machine learning algorithms are utilised. Initially, input data were plotted in order to decide which decision boundary can be developed. Figure 1: Input Dataset I saw that most points regarding admitted and not admitted applicants are linearly seperable, therefore a linear decision boundary could be applied.

Post

Linear regression with multiple variables

The aim of this project was to predict the prices of houses via linear regression with multiple variables. This was done by taking into account information on recent houses sold and construct a predictive model of housing prices. The input dataset contains information about the size of each house in square feet, the number of bedrooms, as well as the price of the house. Figure 1: Input Dataset Creating the plot of computed cost in each iteration enables us to realise if the training process converges to optimal values.

Post

Linear Regression

*In the context of this project, I created a model which aim was to predict profits of a food company considering of branching out to a new city. This was done based on populations from various cities and the profits of outlets established there. In cases like this one, regression machine learning algorithms are utilised. A visualisation of the input data can be seen in Figure 1. Figure 1: Input Dataset visualisation We see that there is a linear relationship between the population of cities and the profits that a restaurant make.