Regularised Logistic Regression
- In this project, I predicted whether microchips from a fabrication plant pass quality assurance process (QA). During QA, each microchip goes through various tests to ensure it functions correctly. Such problems belong to classification machine learning approaches.
I had the test results for some microchips on two different tests results on past microchips. A part of the utilised microchips tests as well as the label for each test can be shown in Figure 1.
First, I plotted the test results in order to understand if there are any relationships between input data. Visualisation of tests results is depicted in Figure 2.
Based on this dataset, I wanted to determine whether the microchips should be accepted or rejected. As we can see in Figure 2, since points cannot be separated through a straight line, simple logistic regression could not classify well data points. Therefore, in order to make a more complex decision boundary, more features are needed. As such, features are mapped into polynomial terms of up to the sixth power, resulting in a 28-dimensional space.
Nevertheless, creating such a complex decision boundary may lead to overfitting. That’s why I tested several values of hyperparameter lambda. Lambda value of 1 gave me a good decision boundary, as you can in Figure 3 below.
However, there are still some points which are not classified correctly. I calculated the training accuracy of this model in order to have a more clear view of its performance. The training accuracy of the trained model was:
Training accuracy = 83.05%
Well, wrong predictions remind us the phrase that, in machine learning:
All models are wrong, but some are useful!
The code of this project’s implementation in Matlab can be seen in this github repository: Link to Github repository