Lecture 1

Introduction to Machine Learning

What is machine learning? Well, as Shakir Mohamed, a researcher at DeepMind, put it: "Almost all of machine learning can be viewed in probabilistic terms, making probabilistic thinking fundamental. It is, of course, not the only view. But it is through this view that we can connect what we do in machine learning to every other computational science, whether that be in stochastic optimisation, control theory, operations research, econometrics, information theory, statistical physics or bio-statistics. For this reason alone, mastery of probabilistic thinking is essential."

We can broadly classify machine learning methods into four families:

1. Supervised Learning

Here, the task $T$ is to learn a mapping $f$ from inputs $x\in \mathcal{X}$ to outputs $y\in \mathcal{Y}$. The inputs $x$ are called the features, covariates, or predictors; this is often a fixed-dimensional vector of numbers, such as the height and weight of a person, or the pixels in an image. In this case, $\mathcal{X}=\mathbb{R}^D$, where $D$ is the dimensionality of the vector (i.e the number of input features). The output $y$ is also known as the categorical, or nominal variable. The experience $E$ is given in the form of a set of $N$ input-ouput pairs, known as the training set:

$$ \mathcal{D}=\left\{\left(x_n,y_n\right)\right\}_{n=1}^N $$

(Training Set Variables)

$\mathcal{D}$ : Training set
$x$ : $D-$dimensional vector
$y$ : A categorical or nominal variable
$N$ : Number of training instances
$n$ : Index of training instance $(n\in \{1,\dots,N\})$

One common method is classification, where the output space is a set of $C$ unordered and mutually exclusive labels known as classes. The idea is to attempt to predict a class label given an input. More supervised learning methods include image classification, machine translation, object recognition, image captioning, etc.

2. Unsupervised Learning

Here, we assume that each input $x$ in the training set has an associated set of output targets $y$, and our goal is to learn the input-output mapping. A common goal is to try to "make sense of" the data, that is, we get observed 'inputs' $\mathcal{D}=\left\{x_n:n=1:n\right\}$ without any corresponding "outputs" $y_n$.

From a probabilistic perspective, we can view the task of unsupervised learning as fitting an uncodnitional model of the form $p(x)$, which can generate new data $x$, whereas supervised learning involves fitting a conditional model, $p(y|x)$, which specifies (a distribution over) outputs given inputs.

A simple example of unsupervised learning is the problem of finding clusters in data. The goal is to partition the input into regions that contain “similar” points.

Another popular example is called self-supervised learning, where we create proxy supervised tasks from unlabeled data, and expecting the model to "fill-in" the gaps and to perform downstream tasks. This could involve reconstructing a partial image, or filling in missing words from a sentence, for instance.

3. Semisupervised Learning

As described in its title, this is a mix between supervised and unsupervised learning where we we have a small subset of labelled data, with most of it being unlabeled. A very common example of semisupervised learning is with recommendation systems. For example, with Netflix, their model has only seen a small number of interactions with the users (in this case the labeled data), and Netflix would like to predict and subsequently recommend the type of content you would enjoy.

4. Reinforcement Learning

In this case, there is very weak supervision, and the system or agent has to learn via a reward system. In essence, the system is not told which action is the best one to take (i.e which output to produce for a given input). Instead, the system just received an occasional reward (or punishment) signal in response to the actions that it takes. This is like learning with a critic, who gives an occasional thumbs up or thumbs down, as opposed to learning with a teacher, who tells you what to do at each step.