Intro

What is ML¶

A computer program learns from XP (E) wrt task (T) and perf measure (P) if P on T improves with E.

No free lunch rule:¶

Training and testing data come from the same distribution
Some assumptions and biases

Factors affecting perf¶

quality of training data
Form and extent of initial background knowledge
type of feedback provided
learning algo used

Two important factors¶

Modelling
Optimisation

Types of ML¶

Based on Info available¶

Supervised (\(\{x_n \in \mathbb{R}^d, y_n \in \mathbb{R}\}^N_{n=1}\))
1. classification
2. regression
Unsupervised (\(\{x_n \in \mathbb{R}^b\}^N_{n=1}\))
1. clustering
2. probability distribution estimation
3. finding associations in features
4. dimension reduction
Semi Supervised
Reinforcement
1. Decision making (robots, games)

Based on learner's role¶

Passive - What most ML models are, use data to produce a model
Active - Query the environment, perform experiments

Different ML Problems¶

Classification Problem¶

Tumor classification: Find a good classifier - a mapping (function) from samples to labels normal/tumor

A sample is represented using a feature vector. For instance, the coordinates of this vector could be constructed from a measure of activeness of each gene in the tissue cells. The assumption here is that this info is sufficient to determine malignance.

It is important to follow the same protocol to construct feature vectors for training samples and any new samples to be classified.

Gender from images¶

How do we represent the feature vectors? We could just concat the pixel values of the high res image but this may not work very well, because this is an overwhelming amount of information that is left to the classifier to figure out. A better first step might be to classify the image based on easier-to-determine features such as skin colour, edge detection, eyes, hair using simpler classifiers. Then, we can use the above as a 2^nd feature vector to use for a gender classifier.