# 3 Approaches to Classification in Machine Learning

In this lesson, we would examine 3 approaches to classification. The first 2 would be based on the a priori knowledge of the probabilities. The third one would be based on the formulation of a discriminant function.

1. Determination of the Class-Conditional Densities p(x | Ck)
2. Determination of the a Posteriori Class Probabilities p(Ck | x)
3. Use of a Discriminative Function

Let’s begin with the first one.

1. Determination of the Class-Conditional Densities p(x | Ck)
The first step is to solve the inference problem of determining the class-conditional probability densities p(x|Ck) for each class Ck individually. Also separately infer the prior class probabilities p(Ck).
Then use Bayes theory of the form

to find the posterior class probabilities p(Ck | x).
We not that the expression p(x) in the numerator can be found by the formula:

Having found the posterior probability p(Ck | x), we can use decision theory to determine the class membership. Such approaches that model the inputs as well as the outputs are called generative models, because they can be used to generate synthetic data points in the input space.

2. First Determine the Posterior Class Probabilities, p(Ck | x)
This approach first solves the inference problem of determining the posterior class probabilities p(Ck | x), and then subsequently use decision theory to assign each new x to one of the available classes. Such approaches that model the posterior probabilities directly are called discriminative models.

3. Using a Discriminant Function
The third approach is to find a function f(x), called a discriminant function. This function maps each input x directly to a class label.
For example, in the case of the two-class problem, the function could have a binary output such that f(x) = 0 represents class C1 while f(x) = 1 represents class C2. In the use of discriminant function, probability is not used. The binary output discriminant function is shown:

Final Notes
The first approach is the most demanding  of the three. This is because it involves finding the joint distribution over both x and Ck. But this approach have the advantage of allowing for marginal dencity of data to be determined.
The simplest approach is the last one as it does not require probability functions.