I just finished up the Machine Learning course on Coursera and wanted to jot down some notes. First off, this was awesome material and great instructor (kind of *the* founding course of the whole MOOC idea; kind of cool).

My only real complaint is that the programming exercises were a bit too simple — everything was “fill-in-the-blank” Matlab functions. I don’t think I wrote more than 20 lines of Matlab code for any one assignment. While the parts I “did” were the meat of the algorithms, I think more of the work in applying this stuff to real problems is in the set-up. But oh well. Honestly if the exercises were really difficult I probably wouldn’t have had the time to work on them, and now I have several good examples of the “right” way to tackle certain problems with certain methods.

Oh, I guess I have a second complaint – a lot of the earlier video lectures frequently use an example problem involving detection of cancerous tumors; since I watched the lectures during my lunch break, this kind of turned my stomach. 😉

Anyway, here’s the highlights, according to me:

- Linear regression – a “supervised” learning algorithm (most of these are), meaning that we have a set of data f(x) = y which we want to use to make predictions about new data. Establish a linear function of features x, and test all of your training data against a set of weights, theta. Then use the cost function J to update the weights. Iterate. (note that you should use fminunc or similar algorithms rather than cooking up your own gradient descent or similar!)

- Logistic regression – similar to linear regression, but now the output of interest y are binary 1 or 0, yes or no — the prediction function is now a sigmoid. To do multi-classification for problems with more than 2 categories, you simply set up a different logistic classifier per category. Then each one answers “yes, it is in this category” or “no, it is not in this category.” Well, actually you can think of the answers as probabilities — hopefully one category has a much higher category than all the rest!

- Neural Networks – oooh, buzzword-y! Not really as complicated as it seems, however. Similar to logistic regression, but more suited for nonlinear problems with many features and/or interactions. You set up a network of multi-input, single-output units (“neurons”), then iterate to find the weights to apply to path.

- Support Vector Machines – also similar to logistic regression. A “large margin” classifier, meaning the boundaries between categories are as optimal as possible. You pick a set of landmarks, then compute how far features are from the landmarks (the “similarity”).

- K-means clustering – the one “unsupervised” learning algorithm in the bunch. Instead of trying to build a model to predict future outputs based on training data as in the supervised learning case, now we are just trying to group data into buckets. Randomly select K cluster centroids, find the nearest centroid for each datapoint, then reassign the centroids to the mean of all the closest datapoints. Repeat.

- Collaborative filtering – this is like multi-variable linear regression, but we are estimating our features x along with the weights theta. Depends on having some data to start with … eg Adam, Bob, and Charlie rate movie A and B highly while Dave, Ernie, and Fred rate A and B low but movie C high. The system infers that A and B belong to a different group from A. Further, when Greg rates movie A highly, the system infers than he would probably like movie B, too.

Programming exercises of note:

- Optical character recognition. Given an image of a number, classify it as a digit 0-9. Done with logistic regression or neural network.
- Netflix style movie recommender system using collaborative filtering.