We would explain the concept of dimensionality reduction in a very simple way. In line with the principle of my articles, I would try to be as clear as possible.
In this lesson, we would focus on the explaining the concept and in the next lesson, we would look at the underlying derivation of the technique.
- Problem with High Dimensional Data
- What is Dimensionality Reduction
- Two Type of Dimensionality Reduction
- What is Principal Components Analysis
- Methods of Dimensionality Reduction
- How PCA Works
- Demonstration of PCA
- Obtain Covariance Matrix
- Obtain Eigen Pairs
- Obtain Scores and Loadings
- Extract the Principal Components
The Simple Explanation
You already know that if you are given data in two dimension, say x and y, you could probably plot the graph and see the relationship. What if you are given data in three dimension? You could still try to create the plot but if the data is large enough, visualizing the plot would be difficult.
Now what if the data is in 10 or 20 or even 100 and more? How could you plot it? Even if you could, you find out that it may not make much sense. This is were dimensionality reduction or dimensional reduction comes in.
Formal Definition of Dimensionality Reduction
“…is the process of reducing the number of random variables under consideration by obtaining a set of principal variables” – Wikipedia
“… it the process of reducing the number of variables or features in review” – Big Data University
Problem of High-Dimensional Data
- training a model with high-dimensional data requires much time-space complexity
- Not all the features of the data are relevant to the problem being solved
- Data in lower dimension has lower noise(unnecessary parts of the data)
Type of Dimensionality Reduction
The two types of dimensionality reduction are:
1. Feature Extraction: This technique has to do with finding new features in the data after it has been transformed from a high-dimensional space to a low dimensional space.
2. Feature Selection: This have to do with finding the most relevant features to a problem. This is done by obtaining a subset or key features of the original variables
An eigenvector in linear algebra is a vector that would not change its direction under associated linear transformation. If we have a non-zero vector v, then its an eigenvector of a square matrix A is Av is a scalar multiple of v.
The eigenvalue is a scalar characteristic value associated with the eigenvector v
Eigenvectors are the coefficients attached to the eigenvectors and that is what gives the axes their magnitude.
Dimensionality Reduction reduces data in high dimension to lower dimension by obtaining the principal components
PCA is performed by:
- constructing a co-variance matrix
- performing an eigen-decomposition of that matrix to obtain a set of eigenvectors (W)
- columns of W are ordered by the size of their corresponding eigenvalues
- choose the first n columns of W and use it to describe your data
In the next lesson(which would be a web video), we would actually some of the derivations behind Principal Component Analysis).
We would also perform PCA on real data using MatLAB and R.
So you can follow this course to get updates(just click on the follow button under the name of the author) and also subscribe to the video channel here.