In the first blog, we discussed some important metrics used in regression, their pros and cons, and use cases. This part will focus on commonly used metrics in classification, why should we prefer some over others with context.   Definitions Let’s

Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive and explanatory features of the collection is a

Tutorial Slides by Andrew Moore, computer scientist at Google, ex-CMU professor Posted by Vincent Granville on April 24, 2011 at 8:07pm in Online Tutorials Decision Trees. The Decision Tree is one of the most popular classification algorithms in current use in Data Mining and

Introduction If you have spent some time in machine learning and data science, you would have definitely come across imbalanced class distribution. This is a scenario where the number of observations belonging to one class is significantly lower than those belonging

Introduction The need and importance of extracting data from the web is becoming increasingly loud and clear. Every few weeks, I find myself in a situation where we need to extract data from the web. For example, last week we

By Jason Brownlee on November 25, 2013 in Machine Learning Algorithms In this post, we take a tour of the most popular machine learning algorithms. It is useful to tour the main algorithms in the field to get a feeling of

Introduction This could help you in building your first project! Be it a fresher or an experienced professional in data science, doing voluntary projects always adds to one’s candidature. My sole reason behind writing this article is to get your

Bayes’ Rule Applied Using Bayesian Inference on a real-world problem The fundamental idea of Bayesian inference is to become “less wrong” with more data. The process is straightforward: we have an initial belief, known as a prior, which we update as we

