48 lines
1.7 KiB
Plaintext
48 lines
1.7 KiB
Plaintext
= Machine Learning =
|
|
|
|
Machine learning is a technique in which an algorithm is given data, and over
|
|
time 'learns' to approximate the underlying relationship in said data.
|
|
|
|
== Types of learning ==
|
|
|
|
=== Supervised ===
|
|
|
|
Supervised learningis when an algorithm `F(x)` tages some non-target attribues
|
|
`x` and attempts to approximate some known ground truth, `y`. This is
|
|
supervised, because the expected result of the data is known.
|
|
|
|
=== Unsupervised ===
|
|
|
|
Unsupervised learning is when no ground truth for the algorithm is provided. we
|
|
instead focus on finding broad corrleations.
|
|
|
|
* Clustering is when a given data set is clustered into groups based on
|
|
similarities among samples in said cluster. For example a cluster made up of
|
|
customer profiles can be clustered based on possible interests.
|
|
* Association is when an algorithm attempts to find out what clusters appear in
|
|
a dataset. For example, finding what products tend to be purchased together.
|
|
|
|
=== Semi-supervised ===
|
|
|
|
Semi supervised learning is used when only a small portion of the dataset is
|
|
labeled. We can generally either
|
|
|
|
* train the model on the limited set of labeled datapoints then have it perform
|
|
unsupervised training on the rest of the data
|
|
* cluster the unlabeled data, then use the sampled data to hone in the
|
|
clusters
|
|
|
|
== Output ==
|
|
|
|
Machine learning models are divided into two types
|
|
|
|
* Classification (boolean outputs)
|
|
* Regression (continuous outputs)
|
|
|
|
Classification is used to distinguish between two or more choices.
|
|
|
|
For regression models the sample is often a Tuple of several factors about the
|
|
sample, and often those fields can be categorical. Each element of a tuple can
|
|
be either categorical (discrete) or numeric (continuous). These elements are
|
|
often called features.
|