I recently finished the Discrete Optimization course on Coursera, taught by Pascal Van Hentenryck. I thought it was really good! Van Hentenryck is pretty crazy in the first set of videos, pretending to be Indiana Jones. He gets quite a bit tamer later on, but still good material.
The course covers three main optimization techniques:
- Constraint Programming – “Constraint Programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it.” – Freuder, 1997
- Local Search – define a set of moves (neighborhood), start somewhere, then make a move. Usually move to decrease objective function but there is danger of local minima. So do stuff like simulated annealing or tabu search to break out.
- Linear and Integer Programming – Linear program solution is a vertex of polytope defined by intersection of constraints. Simplex algorithm is a way to explore vertices. The nice thing about linear programming is that it is easy to show optimality.
I was most intrigued by constraint programming due to some applications to a constrained scheduling problem at my job, so I focused on CP solutions to the problems … probably why I didn’t score as well as I would have liked; I think part of the point of the course is that no single hammer is sufficient to optimally drive all these nails.
The programming assignments were the best part of the course. For each one, a skeleton Python script is provided which reads in data files and performs some simple “dummy” approach. The student’s job is to implement something better…there’s no real rules on how you get there. You can even enlist the aid of external solver libraries, which is what I did (see below). There is an auto-grader which will test your solution against 6 or so data sets of increasing complexity.
The problems are fairly standard ones from the OR world. These kinds of problems are NP-hard, so determining solutions eventually take exponential time as the problem size increases, but as covered in the lectures, the goal is to push that curve out as far to the right as possible.
- Knapsack – given a set of items with a given value and given weight, select the set of items to place into a weight-limited knapsack which maximizes total value
- Graph Coloring – use as few colors as possible to color a map, such that no countries with a shared border have the same color
- Traveling Salesman – find the shortest distance route which visits each location once
- Facility Location – the are a set of possible factories, which can produce a given number of widgets but cost a certain amount to open (so some factories will likely remain closed). Then there is a set of customers, each with a given widget demand. These customers and factories are located in a 2D space. The goal is to minimize the total cost = set up cost of all open factories + distance for each customer and the factory it is served by.
- Vehicle Routing – a set of vehicles start at a warehouse. Each has a fixed capacity. There is a set of customers at various locations relative to the warehouse. Each has a fixed demand. Vehicle routes must be determined such that every customer is serviced and total vehicle distance traveled is minimized. (Kind of a combo of knapsack and traveling salesman.)
One bad part of this course is that there wasn’t a lot of guidance on actually implementing some of the ideas covered in the lectures. It’s pretty much a free-for-all. I did do as recommended and started out with a simple greedy algorithm to start each assignment. For instance, for knapsack “take greatest value density items first” actually turned out to be a good strategy. I was pretty surprised actually that these simple greedy algorithms, for most of the problems, came up with an 70% – 80% solution. (by my estimate; the scoring for the class was a bit tougher than that: 3 points for what I would call a 75% solution, 7 points for a 90% solution, and 10 points for 95%+ solution) But I guess that’s what separates the men from the boys in OR – a decent solution is ok (probably good enough for most purposes) but it isn’t OPTIMAL.
So, after my greedy solution attempts, I elected to integrate Google’s or-tools. I thought Google would be a sure thing, but I wasn’t totally satisfied in the end. All my issues have to do with the subpar documentation. There actually is what appears to be pretty good documentation, but this is an illusion — much of it is incomplete stubs and virtually nothing is specific to the Python wrapper (around a C++ implementation … so I couldn’t easily examine code.)
So I didn’t really have any good direction on how to use the library, except for a large selection of excellent examples, mainly from a guy called hakank, that are included in the or-tools distribution. Actually there are example implementations of several of the assigned problems available; I got decent scores by making these work with the course data formats. But not quite 10 point solutions, and not always 7 point solutions … I blame on lack of tuning, since, again, I did not have much visibility into what the or-tools algorithms were doing.
I was also probably hampered by some Python rustiness, such as forgetting about python’s mutable list behavior. I’m still using Pyscripter and think it is fine (good debugger is the key for me!) but I still haven’t got a great feel on the best Python data structures to use in different situations.
Finally, like I said above, I only focused on constraint programming which was probably not always the correct tool for the job. In the end I got 223 points out of 320; the requirement for a certificate was 224. Yes, I was one point off. (Reminds me of the time I missed 3rd place in the state Academic Decathlon meet by something like 30 points out of 10000 … but that’s another story.) I realize the certificates are kind of meaningless, but having a deadline and a goal gets me motivated and I was pretty bummed to miss out. I really want to continue poking at some of these (and related) problems – pretty interesting stuff.
I recently finished up my second course on Coursera – Computational Investing I by Tucker Balch. Learned a little bit more about how the stock market works, and (maybe somewhat) how the quants do their thing. A little bit of what they do could probably be called order book exploitation — looking for arbitrage opportunities between buy and sell orders at the microsecond level. This type of computational investing is pretty much out of reach for anyone outside of a big firm with direct trading floor access.
What’s left for the little guy is offline, day-to-day analysis of stock information. Dr. Balch has created a Python package, QSTK, which pulls downloaded stock price information into a data analysis friendly format within Python. I really liked how the assignments all fit together — I now have a basic, yet functional, “Market Sim” in Python. It is not really a predictive simulation for future market movement; rather it is an evaluation tool for how a certain metric-based investment strategy would have performed if executed during a past window of time. For instance, the final assignment involved setting up an “event study” looking for instances where a Bollinger band metric behaved in a fashion showing that the stock was falling while the overall market was rising. Whenever the event was observed for a given stock, the tool logs a buy order for the stock followed by a sell order 20 days later. Then, a separate piece of code reads in the buy/sell orders and computes what the overall rate of return (and other metrics like Sharpe ratio) would be given actual stock price historical data, then compares the strategy to the overall performance of the market during the same time period. Thus the Market Sim developed in this course would allow the user to try out different investment strategies, evaluating them against real historical data.
It’s a simple yet pretty cool project that I’m glad to have completed. Especially glad for the opportunity to get back into Python, which I still love (two timing now with Matlab, I guess…) and which I’ve got a pretty decent workflow going with PyScripter (nice tool).
But I don’t think I’ll be going into business anytime soon as a day trader. My first inclination was to somehow meld the Market Sim developed in this course with a neural network or other machine learning strategy — then you wouldn’t need to explicitly define any “events.” But a quick Google search shows that many have tried this and failed. Maybe it is due to my second reservation, which is that past performance is not a reliable indication of future performance. (But it’s the best we have to go with, so maybe that’s ok…) Third reservation is that many strategies would involve too many individual trades to be profitable for the little guy, both when trade commissions and when the time required to execute trades manually are taken into account. (Are there any systems where you can specify auto-trades upon a certain event? That would be cool, but scary too … if the perfect storm comes and your algorithm gets really fouled up, it could clear out your bank account real fast. Automated gambling, anyone?)
Not saying it is impossible for the little guy to be profitable using these methods, but it would be difficult. The course itself talks about the fundamental law, basically saying that if you aren’t able to execute a small number sure-fire big money trades, then your only recourse is to execute a very large number of more modest trades, each with a slim yet positive probabilistic chance of profitability. But lots and lots of trades are impractical for the little guy for reasons discussed above. Also, I tend to believe in the efficient market hypothesis, simply because there are so many people working on this problem, with so much more knowledge and resources than I have to devote to it. I think I’ll stick with index investing.
On a side note, shortly before taking this course I read an article on the Aladdin system used by Blackrock. (I can’t find the article now … it wasn’t this Economist piece, nor was it this one from FT.) A lot of money is controlled by algorithms now, for better or for worse. I couldn’t help thinking of the Asimov story from I, Robot, where artificial intelligences are directly controlling the economy . . . are we there yet?
I just finished up the Machine Learning course on Coursera and wanted to jot down some notes. First off, this was awesome material and great instructor (kind of the founding course of the whole MOOC idea; kind of cool).
My only real complaint is that the programming exercises were a bit too simple — everything was “fill-in-the-blank” Matlab functions. I don’t think I wrote more than 20 lines of Matlab code for any one assignment. While the parts I “did” were the meat of the algorithms, I think more of the work in applying this stuff to real problems is in the set-up. But oh well. Honestly if the exercises were really difficult I probably wouldn’t have had the time to work on them, and now I have several good examples of the “right” way to tackle certain problems with certain methods.
Oh, I guess I have a second complaint – a lot of the earlier video lectures frequently use an example problem involving detection of cancerous tumors; since I watched the lectures during my lunch break, this kind of turned my stomach. 😉
Anyway, here’s the highlights, according to me:
- Linear regression – a “supervised” learning algorithm (most of these are), meaning that we have a set of data f(x) = y which we want to use to make predictions about new data. Establish a linear function of features x, and test all of your training data against a set of weights, theta. Then use the cost function J to update the weights. Iterate. (note that you should use fminunc or similar algorithms rather than cooking up your own gradient descent or similar!)
- Logistic regression – similar to linear regression, but now the output of interest y are binary 1 or 0, yes or no — the prediction function is now a sigmoid. To do multi-classification for problems with more than 2 categories, you simply set up a different logistic classifier per category. Then each one answers “yes, it is in this category” or “no, it is not in this category.” Well, actually you can think of the answers as probabilities — hopefully one category has a much higher category than all the rest!
- Neural Networks – oooh, buzzword-y! Not really as complicated as it seems, however. Similar to logistic regression, but more suited for nonlinear problems with many features and/or interactions. You set up a network of multi-input, single-output units (“neurons”), then iterate to find the weights to apply to path.
- Support Vector Machines – also similar to logistic regression. A “large margin” classifier, meaning the boundaries between categories are as optimal as possible. You pick a set of landmarks, then compute how far features are from the landmarks (the “similarity”).
- K-means clustering – the one “unsupervised” learning algorithm in the bunch. Instead of trying to build a model to predict future outputs based on training data as in the supervised learning case, now we are just trying to group data into buckets. Randomly select K cluster centroids, find the nearest centroid for each datapoint, then reassign the centroids to the mean of all the closest datapoints. Repeat.
- Collaborative filtering – this is like multi-variable linear regression, but we are estimating our features x along with the weights theta. Depends on having some data to start with … eg Adam, Bob, and Charlie rate movie A and B highly while Dave, Ernie, and Fred rate A and B low but movie C high. The system infers that A and B belong to a different group from A. Further, when Greg rates movie A highly, the system infers than he would probably like movie B, too.
Programming exercises of note:
- Optical character recognition. Given an image of a number, classify it as a digit 0-9. Done with logistic regression or neural network.
- Netflix style movie recommender system using collaborative filtering.