vimwiki/tech/machine_learning.wiki

= Machine Learning =

Machine learning is a technique in which an algorithm is given data, and over
time 'learns' to approximate the underlying relationship in said data.

== Types of learning ==

=== Supervised ===

Supervised learningis when an algorithm `F(x)` tages some non-target attribues
`x` and attempts to approximate some known ground truth, `y`. This is
supervised, because the expected result of the data is known.

=== Unsupervised ===

Unsupervised learning is when no ground truth for the algorithm is provided. we
instead focus on finding broad corrleations.

* Clustering is when a given data set is clustered into groups based on
  similarities among samples in said cluster. For example a cluster made up of
  customer profiles can be clustered based on possible interests.
* Association is when an algorithm attempts to find out what clusters appear in
  a dataset. For example, finding what products tend to be purchased together.

=== Semi-supervised ===

Semi supervised learning is used when only a small portion of the dataset is
labeled. We can generally either

* train the model on the limited set of labeled datapoints then have it perform
  unsupervised training on the rest of the data
* cluster the unlabeled data, then use the sampled data to hone in the
  clusters

== Output ==

Machine learning models are divided into two types

* Classification (boolean outputs)
* Regression (continuous outputs)

Classification is used to distinguish between two or more choices.

For regression models the sample is often a Tuple of several factors about the
sample, and often those fields can be categorical. Each element of a tuple can
be either categorical (discrete) or numeric (continuous). These elements are
often called features.