How to Minimize Misclassifcation Rate in Classification (Machine Learning)

Remember that classification is a supervised learning concept that has to do with determining the the class a new input variable belongs.
In trying to assign input variables to classes, there is possibility that we may assign to a wrong class. This is called misclassification.

By definition, misclassifcation occurs when an input variable is assigned to the wrong class.

One of the goals of classification is to minimize the number of misclassifications. This is done by defining a rule that assigns input x to one of the available classes.
The approach is to divide the input space into regions Rk called decision regions, one region for each class.
Rk is assigned to class Ck

Consider the case of two classes C1 and C2. A mistake occurs when an input vector belonging to R1 is assigned to C2 or vector x belonging to R2 is assigned to C1.
We can represent this as follows:

To minimize misclassification, we must choose to assign x to which of the classes has the smaller value of the integrand.

So if p(x, C1)  is greater than p(x, C2), then x would be assigned to C1.

Using the product rule, we can  determine the posterior probability:

p(x, Ck) = p(Ck | x)p(x)

Note that the term p(Ck | x) is known as the posterior probability and x should be assigned to the class having the largest posterior probability p(Ck | x).

For the more general case of K classes, it would be a bit easier to maximize the probability of being correct and this is given by:

This means the to minimize misclassification, we need to maximize this probability over the region Rk.
Using the product rule which states that:

p(x, Ck) = p(Ck | x)p(x)

We can see that each class has to be assigned to the class that have the highest posterior probability  p(Ck | x)