April 13, 2021

What is Principal Component Analysis (PCA) – A Simple Tutorial

In this simple tutorial, I would explain the concept of Principal Components Analysis (PCA) in Machine Learning. I would try to be as simple and clear as possible.
The we would use Python in Tutorial 2 to actually do some of the hands-on, performing principal components analysis.

What is Principal Components Analysis?
Principal Components Analysis is an unsupervised learning class of statistical techniques used to explain data in high dimension using smaller number of variables called the principal components.
In PCA, we compute the principal component and used the to explain the data.

How PCA Work?
Assuming we have a set X made up of n measurements each represented by a set of p features, X1, X2, … , Xp. If we want to plot this data in a 2-dimensional plane, we can plot n measurements using two features at a time. If the number of features are more than 3 or four then plotting this in two dimension will be a challenge as the number of plots would be p(p-1)/2 which would be hard to plot.
We would like to visualize this data in two dimension without losing information contained in the data. This is what PCA allows us to do.

 

How to Computer Principal Components?
Given a dataset X of dimension n x p, how do we compute the first principal components?
To do this we look for linear combination of the feature values of the form:

that has the largest sample variance subject to the constraint that:

This means that  the first principal component loading vector solves the optimization problem such that we need to maximize the objective function subject to some constraint.
The objective function is given by:

And this is subject to the constraint:


The objective function (function to maximize) can be rewritten as:

Since this also holds:

Therefore the average of z11,…, zn1 will also be zero. Therefor the objective function that is being maximized is simply the sample variance of the n values of zi1.
z11, z2,…,zn1 are referred to as the scores of the first principal component.


How then do we maximize the given objective function? 
We do this by performing eigen decomposition of the covariance matrix. Details of how to perform eigen decomposition is explained here.

Explaining the Principal Components
The loading vector Ф1 with elements Ф11, Ф21,…,Фp1  defines a direction in the feature space along which there is maximum variance in the data.
Thus, if we are to project the n data points x1, x2,…, xn onto this direction, then projected values are the actual principal component scores z11, z21, …, zn1.

After the first principal components, Z1 of the features has been determined, then the second principal component is the linear combination of X1, ,X2,… Xp that has the highest variance out of all the linear combinations that are uncorrelated with Z1. The second principal component scores z12, z22,…,zn2 take the form

where Ф2 is the second principal component loading vector, with elements Ф11, Ф12, … ,Фp2 . It turns out that constraining Z2 to be uncorrelated with Z1 is the same as constraining the direction of Ф2 to be orthogonal to the direction of  Ф1
We would now take an example to see how PCA works.

 

 

0 0 vote
Article Rating
Subscribe
Notify of
guest
8 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shelli
Shelli
1 year ago

I enjoy the article

Maureen Sivia
Maureen Sivia
1 year ago

Hey there just wanted to give you a quick heads up. The text in your post seem to be running off the screen in Firefox. I’m not sure if this is a format issue or something to do with browser compatibility but I figured I’d post to let you know. The style and design look great though! Hope you get the problem fixed soon. Thanks|

Roland Ignacio
Roland Ignacio
1 year ago

I want to to thank you for this great read!! I absolutely loved every little bit of it. I have you book marked to check out new stuff you post…|

Zonia Sixtos
Zonia Sixtos
1 year ago

We stumbled over here different page and thought I should check things out. I like what I see so now i am following you. Look forward to looking over your web page again.|

Inger Werra
Inger Werra
1 year ago

Everyone loves it when folks come together and share views. Great blog, keep it up!|

Oda Erbes
Oda Erbes
1 year ago

I’m not that much of a internet reader to be honest but your blogs really nice, keep it up! I’ll go ahead and bookmark your site to come back later. Cheers|

Willie Leitem
Willie Leitem
1 year ago

I think this is among the most vital information for me. And i am glad reading your article. But wanna remark on few general things, The website style is great, the articles is really nice : D. Good job, cheers|

Genevive Mena
Genevive Mena
1 year ago

I have learn some good stuff here. Definitely price bookmarking for revisiting. I wonder how a lot effort you set to create this kind of excellent informative site.|