Many of you will have guessed it right. We use hyperplanes when it comes to more than 3D. We will define it using a bit of mathematics.
A linear equation looks like this: y = ax + b has got two variables, x and y, and a y-intercept, which is b. If we rename y as x2 and x as x1, the equation comes out as x2=ax1 + b which implies ax1 - x2 + b=0. If we define 2D vectors as x= (x1,x2) and w=(a,-1) and if we make use of the dot product, then the equation becomes w.x + b = 0.
So, a hyperplane is a set of points that satisfies the preceding equation. But how do we classify with the help of hyperplane?
We define a hypothesis function h:
h(xi) = +1 if w.xi + b ≥ 0
-1 if w.xi + b < 0
This could be equivalent to the following:
h(xi)= sign(w.xi + b)
It could also be equivalent to the following:
sign(w.xi) if (x0=1 and w0=b)
What it means is that it will use the position of x with respect to the hyperplane to predict a value for y. A data point on one side of the hyperplane gets a classification and a data point on other side of hyperplane gets another class.
Because it uses the equation of a hyperplane that happens to be the linear combination of the values, it is called a linear classifier. The shape of hyperplane is by w as it has elements as b and a responsible for the shape.