Feature Selection and its importance

Feature Selection also known as attribute selection is task of selecting relevant features from the given set of features.

If you know machine learning algorithms this doesn’t mean that you know machine leaning. It is not the case machine learning is all about selecting best algorithm. In many machine learning competition and real time problem many people uses same algorithm but have different results. What make winner different from others is that how one creates/extract and selects feature from the given set of features. So feature selection is most important task in Machine Learning.

Feature selection is used because of:
  • ·        Reduce training time: If we have n feature it may possible that all of them not contribute much to model learning. So it is better to remove that features.
  • ·        To avoid curse of dimensionality: As the number of feature grows, the amount of data require to generalize model accurately also grows exponentially.
  • ·        Minimizing risk of over fitting: More complex model will have tendency of over fitting.



Feature set contains various features which compromise of relevant, irrelevant and redundant features. Irrelevant and redundant feature have different notions. Relevant feature may be redundant i.e. two or more feature are highly correlated with each other. So we can use one feature out of all correlated features. Example purchase price of product and the sale tax on product.

So goal of feature selection is to select a subset of features that can represent or data and reduce noise that results in better prediction results. The correlated features do not contribute much to learning or may serve as noise.

To remove irrelevant feature, we requires algorithms that can calculate relevance of features with the output classes. Removing irrelevant feature is different from dimensionality reduction technique(like PCA). To remove redundant feature, we requires algorithms that can calculate the correlation between the features.

Methods of Feature Selection:

Filter Methods:

Filter methods used the various statistical methods to calculate the correlation scores with the outcome variable and select the features on the basis of score. Threshold is set to select the features. In this feature selection is independent of the algorithm being used for learning classifier.

Filter methods are computationally effective, robust to over fitting but doesn’t consider the correlation among the features. So might select redundant features.
Various Filter method are:
  • ·        Chi-Square
  • ·        LDA(Linear Discriminant Analysis)
  • ·        Pearson’s Correlation
  • ·        Anova(Analysis of variance)


Wrapper Method:

This is basically a searching problem which selects best feature by searching among sub optimal subsets and model performance on that subset as the objective function.
In this we select subset of features train the model and add or remove the feature on the basis of previous model result.




Wrapper method is computationally costly whwn number of feature is lasge as it requires searching.
Searching technique includes:
  • ·        Random hill climbing methods
  • ·        Heuristic searches
  • ·        Forward Selection
  • ·        Backward Elimination
  • ·        Recursive Feature Elimination 


Embedded Methods:

It combines the advantage of both the above methods.
It selects the best feature that give best accuracy while the model is created. I perform the feature selection while training a classifier. So it is computationally effective the wrapper method.

Embedded methods are various regularization methods. Regularization methods have inbuilt penalising to reduce over fitting.
Example of regularization methods
  • ·        LASSO: uses L1 regularization
  • ·        Ridge: uses L2 regularization
  • ·        Elastic Net: uses both L1 and L2 regularization


References:


Comments

  1. I think things like this are really interesting. I absolutely love to find unique places like this. It really looks super creepy though!! Best Machine Learning Training in Chennai | best machine learning institute in chennai | Machine Learning course in chennai

    ReplyDelete

Post a Comment