Machine Learning

Author

Carmen Gómez Valenzuela

Most of the problems that are solved today with Machine Learning can be divided into two categories: supervised and unsupervised learning.

The main difference between these two categories is that in supervised learning, for each observation of the predictor variables \((x_i)\), there is a measure of the response variable \((y_i)\). That is, we know what the response is for previous examples and we want to predict future observations based on previous learned data. Supervised learning can be applied to regression or classification problems. The difference is in the type of variable being predicted. In regression, it is a continuous numerical variable, and in classification, it is a categorical variable. It can be used to identify risk factors in diseases such as cancer or to predict heart problems based on diet, clinical measures, and demographics.

Unsupervised learning or clustering divides or segments the input data space into similar groups. Its goal is to find groups with similar characteristics, but we do not have prior knowledge. For example, it is used to segment clients into groups with similar patterns, detect anomalous behavior by identifying patterns that fall outside the usual clusters, or simplify or summarize very large datasets by grouping similar users.