Machine Learning

An introduction to the fundamentals of machine learning for those unfamiliar with the topic.

Machine Learning was the core topic of my master thesis. For detailed information about the thesis, take a look at the projects section. Here, I'll give a small introduction into machine learning for those who are not familiar with the topic.

A Simple Example

Given: Two different classes of fish - whales and sharks. These fish differ in size and teeth characteristics.

If we transform this into a 2D vector space, one axis represents size, the other represents teeth. Because of their different appearances, whales and sharks are arranged in different areas of the space.

The General Learning Model

The General Learning Model
G
Generator
S
Supervisor
LM
Learning Machine

In machine learning, all events are discrete - you cannot observe all whales and sharks of the world. The aim is to find a function or approximation that describes a true prediction given an "mostly" unknown distribution.

The Learning Process:

  1. 1 A Generator generates samples x
  2. 2 A Supervisor knows the true mapping function fp
  3. 3 The Learning Machine gets x and the supervisor's output y
  4. 4 LM learns a function that predicts the same output as the supervisor (minimizing false predictions)

Key Concepts

Target Space (T)

The set of all functions including the true function fp used by the supervisor. In most cases, fp is unknown.

Hypothesis Space (H)

A sub-space of T with a finite set of functions that we choose to work with, since fp is unknown.

Approximation Error (EA)

The difference between the true function fp and the best function fn we could choose in H.

Sample Error (ES)

Due to having only finite samples, fz is the best approximation we can find. The true error = EA + ES.

VC Dimension

Key Insight: The general rule is to keep the function as simple as it can be. Start with the simplest function and converge toward the VC dimension by minimizing risk.

Consider a simple example: If we have two points in a 2D space, we can easily separate them linearly with a line. With 3 points, we can still separate them in 2D. But with four points, we cannot simply separate them in 2D - however, if we transform them into 3D, we can easily separate them linearly.

Number of samples - 1 is the maximal complexity of H. This is called the VC dimension.

Further Reading

This is only an abstract and small introduction into the general machine learning problem and how to solve it. A deeper view can be found in "Pattern Recognition and Machine Learning (Information Science and Statistics)" by Christopher M. Bishop.

Related Project

Master Thesis

Pose-Invariant Face Detection

Based on Trees of Wavelet Approximated Vector Machines