= Machine Learning = Machine learning is a technique in which an algorithm is given data, and over time 'learns' to approximate the underlying relationship in said data. == Types of learning == === Supervised === Supervised learningis when an algorithm `F(x)` tages some non-target attribues `x` and attempts to approximate some known ground truth, `y`. This is supervised, because the expected result of the data is known. === Unsupervised === Unsupervised learning is when no ground truth for the algorithm is provided. we instead focus on finding broad corrleations. * Clustering is when a given data set is clustered into groups based on similarities among samples in said cluster. For example a cluster made up of customer profiles can be clustered based on possible interests. * Association is when an algorithm attempts to find out what clusters appear in a dataset. For example, finding what products tend to be purchased together. === Semi-supervised === Semi supervised learning is used when only a small portion of the dataset is labeled. We can generally either * train the model on the limited set of labeled datapoints then have it perform unsupervised training on the rest of the data * cluster the unlabeled data, then use the sampled data to hone in the clusters == Output == Machine learning models are divided into two types * Classification (boolean outputs) * Regression (continuous outputs) Classification is used to distinguish between two or more choices. For regression models the sample is often a Tuple of several factors about the sample, and often those fields can be categorical. Each element of a tuple can be either categorical (discrete) or numeric (continuous). These elements are often called features.