Let’s Revise High Dimensional Calculus
Finally, let’s get on to brush-up the last set of mathematical concepts before we dive into the world of implementation of these tools, so called Machine Learning or a more glamorous term “Artificial Intelligence”.
Firstly, we all know what functions.They are basically transformation of a variable to another variable.You give one value as input to a function and it spits out another value depending upon a relation between them.
F(x) = y , where x is input and y is output and F() is a relation between these two.
- 3-D functions , where input is a 2 dimensional and output is represented by the third dimension.
- The input can be vector and output can be a vector, for now, let’s say it is a scalar. We will see vector valued functions in some time.
- Contour plots are nice way to represent 3-D functions where each line is represents a value.
here in the diagram, the pink lines are far off compared to blue and green lines.It represent a flatter region, whereas blue and green lines represent steep surface.
VECTOR FIELDS :
- Vector field are the functions which takes scalars as input and spits out vectors, it would be n dimensional vectors.For the sake of visual illustration, here we have an example of 2-D vector field
- In physics we use vector fields to describe fluid flows, electrostatic charge flows , divergence , curls etc.You can say vector fields are the eyes of the physics.
FUNCTIONS AS TRANSFORMATIONS:
- why we emphasis on the fact of transformation as functions is because understanding dimensional changes or derivative changes in a function are more intuitive in terms of physical transformation.For e.g., if you go through the given 2-D function and which is defined by a plane of (t,s), it has been changed to a 3D circular pipe , which would be difficult to understand if i said, the derivative of the function’s first element w.r.t s is 2.3 . This does not make any intuitive sense.
“Transformations give better understanding of the dimensional changes”.
- We understand derivatives as the rate of change of a variable with respect to another variable.
- In terms of high dimensional functions with multiple inputs and multiple outputs, we have a notion of partial derivatives.
- Partial derivatives are very similar to vanilla derivative just a minor change in the definition , that we keep all the input variables constant and differentiate the function with respect to only one variable at a time.
- As you can see in the pic, we have a small nudge h in the direction of x first while keeping the y element constant and secondly a nudge in y axis , keeping the x constant.This notion can be expanded to n dimensions.
- We define gradient operator for a multi dimensional function. We take a dot product of the gradient operator with function to find the derivative of the function w.r.t each of the dimensions.
- It is a notion of defining the derivative of a function along vector i.e. what is the rate of change of a function along a particular vector.
- If you see the formula is w *(grad(f)), where w is the vector and grad(f) is the gradient of the function f and * is the dot product between the two.
- From the notion of directional derivative, we also prove that gradient descent points in the direction of steepest ascent.
- Above picture illustrates the reason why gradient descent actually is the point of steepest ascent.It says the dot product between the w and grad(f) will be maximum only if w points in the direction of the point at which f is computed.Therefore, it has be that point itself.
Hence its magical, that when we compute gradient of a function at a point, it points in the direction of the steepest ascent itself.
CURVATURES AND DIVERGENCE:
- Curvatures defines how fast is the rate of change of change of something(essentially 2nd derivative).
- Curvatures are inversely proportional to the radius of curvature, smaller radius means faster change, larger radius means smaller change.
- Divergence is one of the physical properties of vector fields, which we discussed above, and why they are important is ,it helps us understand the concept of maximas and minimas, i.e. hills and valleys.
- Divergence is +ve if the vectors are large is magnitude outwards and low inwards. Similarly, divergence is -ve if the vectors is smaller in outward direction and large inwards, physical analogy would be sink.
LAPLACIAN AND HARMONIC FUNCTIONS:
- Laplacian helps us understand how much maximum a point is or much minimum a point is.
- With reference to divergence that we saw above, a point of high divergence means, all the points nearby are bigger than that point itself, it is a valley.
- Low divergence around a point means , all the values around a point are smaller than that point and therefore is a hill.
- Laplacian is divergence(grad(f))
- Harmonic functions represents the region where laplacians could be zero i.e. the value of the graph is almost same everywhere, it could be a plateau or a saddle points(will get to this in some-time).
LOCAL LINEARITY AND QUADRATIC APPROXIMATION
- Finding the linearity around a point, even if there is non linearity around in transformation.If we take a slight nudge in zoomed in area of the space, we can find some linearity.
- Jacobian matrix fundamentally represents what a transformation looks like when you zoom in around in areas of specific point.
- similar to linear approximations, we have better approximation of multivariate functions.
- Linear approximations has tangent planes , quadratic approximation has quadratic planes to approximate function values around a region. Similar to Jacobian matrix we had Hessian Matrix which gives sense of these quadratic approximations around a point.
FINDING POINTS OF MAXIMA AND MINIMA
- First find all the critical points
- Perform second derivative test , but that is not sufficient when we consider the case of high dimensional function, because we have saddle points, to understand what saddle points are you can a look at one of article on dilemma of High dimensional calculus.
- So the sufficient condition to determine whether a point is a point of maxima or minima or a saddle point is mentioned above in the picture.
The term partial derivative of f w.r.t to x and y is the significant term to determine the status of the point.
This finishes the crash course tour of the concepts of Multivariate Calculus, I have touched upon the important topics here, though one of the most important topic which i did not cover was that of multivariate chain rule , which is the heart of back propagation algorithm in neural network. I will get to that topic when i derive the networks for you.
Rest i have given out good amount of info for you to dig in more and expand your horizons.
Stay tuned :)