Magic of Inner product and Orthogonal compliment subspaces
I left over the last article with linear algebra concepts, where we covered the vector spaces, row space , column space, eigen vectors decomposition, singular value decomposition and other concepts.
This article is in continuation of the last article itself, here we will look at the real implementation of the linear algebra in true sense. Something we all know about and have been using too, The Principal component analysis for dimensionality reduction.
Before straight away getting into the algorithm, its better we understand some geometric properties of vectors.One of the fundamental property we use is inner product between vectors, which alot of us know in the form of dot product.Basically, dot product is a special case of inner product.
Inner product is a function which has the properties of being positive definite, symmetric and being linear.
Dot product is the case where we perform inner product when there is no transformation in x or y (remember, i talked about transformation in last article ).It is also called the Euclidean inner product.Basically what a euclidean inner product gives you is a value which presents the projection of a vector onto b, just like a projection of a shadow of a man on the road.How would man look like ,if we was part of the road. The same way, how would vector a look, if he was part of the vector b.
Another interesting application of inner product is in finding the length of a vector.The norm of a vector , is a notion of defining magnitude of length of a vector in a vector space.Norm is basically an inner product of the vector with itself.Euclidean norm also called as L2 norm, root of the inner product of a vector with itself.We can further stretch the definition of norm to define distance between the vectors in space, where we replace the inner product of a vector with itself,with another vector which is called euclidean distance.
SPLITTING VECTOR SPACES
- Overall we take a input data confined in a vector space,let’s say R dimensional and divide it into a principal subspace and orthogonal compliment subspace.
- Principal subspace is a subspace of vector , where each of the vector are perpendicular to the orthogonal compliment subspace.
- We project X(input data) onto the M dimensional orthogonal compliment sub-space and minimize the average mean squared error of the projection with original projection.
- After defining concepts of inner product, orthogonal compliments, lets define our objective function of PCA.
- We want to find a vector space,or to be precise ,basis vectors which projects our original dataset with minimal loss of information.
- In the above pic you see, the distance defined , the MSE between the actual projection and the new projection with subset of dimensions.
- to find the projection of X with the orthogonal compliment, we are taking a inner product of the ith component of orthogonal compliment with ith component of X vector.
- what you see on the left is how we transform the objective function into two components , projection matrix(BjT * Bj) and covariance matrix(S).
- Projection matrix takes our data covariance matrix and projects it onto orthogonal compliment subspace.That is , we can formulate loss function as the variance of the data projected on the subspace that we ignore.
- Minimizing the loss equation is to minimizing the variance of data that lies in the orthogonal compliment subspace.In other words,we want to retain as much variance as possible.The reformulation of the average square reconstruction error in terms of data co-variance gives us an easy way to find basis vectors of the principal subspace.
Therefore, Average error is minimized
If we choose basis vectors that span the ignored sub-space to be the eigen vectors of the data covariance that belongs to the smallest eigen values
If principal subspace is spanned by eigen vectors belonging to M largest eigen values of the data covariance-matrix.
The closing statement would be : Orthonormal basis vectors of principal subspace are the eigen vectors of data co-variance matrix that are associated with largest eigen values.
Thus, we see how inner product, eigen-values and orthogonal projections help us to find a low dimensional projection of a high dimensional vector with minimum information loss.
Next article, we will move to the Multivariate Calculus concepts, before moving on to the traditional ML.
Stay tuned :)