A non-separable dataset like the one used previously is always a tough thing to deal with, however, there are ways to deal with it. One way is to set the vectors into higher dimensions through transformation. But, can we really do it when we have millions of data or vector in reckoning? It will take lots of computation and, also, time. That's where kernel to saves our day.
We have seen the following equation. In this, only the dot product of the training examples are responsible for making the model learn. Let's try to do a small exercise here:
Let's take two vectors here:
x1=[4,8]
x2= [20,30]
Now, build a transformation function that will help in transforming these 2D vectors into 3D.
The function to be used in order to transform is the following:
t(x1,x2)= (x12,x1 x2 √2,x22)
#transformation from 2-D to 3-D vector
def t(x):
return [x[0]**2, np.sqrt(2)*x[0]*x[1], x[1]**2]
Now let's use this function:
x1_3D= t(x1)
x2_3D= t(x2)
print(np.dot(x1_3D,x2_3D))# the result is 102400
But can't we do this without transforming the values. Kernel can help us in doing it:
def kernel(a, b):
return a[0]**2 * b[0]**2 + 2*a[0]*b[0]*a[1]*b[1] + a[1]**2 * b[1]**2
It's the time to use this kernel now:
kernel(x1,x2) #the result is 102400
Isn't it quite thrilling to see such an amazing result that is the same as before, without using transformation? So, kernel is a function that leads to the dot-product-like result in another space.