I recently finished the Discrete Optimization course on Coursera, taught by Pascal Van Hentenryck. I thought it was really good! Van Hentenryck is pretty crazy in the first set of videos, pretending to be Indiana Jones. He gets quite a bit tamer later on, but still good material.
The course covers three main optimization techniques:
- Constraint Programming – “Constraint Programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it.” – Freuder, 1997
- Local Search – define a set of moves (neighborhood), start somewhere, then make a move. Usually move to decrease objective function but there is danger of local minima. So do stuff like simulated annealing or tabu search to break out.
- Linear and Integer Programming – Linear program solution is a vertex of polytope defined by intersection of constraints. Simplex algorithm is a way to explore vertices. The nice thing about linear programming is that it is easy to show optimality.
I was most intrigued by constraint programming due to some applications to a constrained scheduling problem at my job, so I focused on CP solutions to the problems … probably why I didn’t score as well as I would have liked; I think part of the point of the course is that no single hammer is sufficient to optimally drive all these nails.
The programming assignments were the best part of the course. For each one, a skeleton Python script is provided which reads in data files and performs some simple “dummy” approach. The student’s job is to implement something better…there’s no real rules on how you get there. You can even enlist the aid of external solver libraries, which is what I did (see below). There is an auto-grader which will test your solution against 6 or so data sets of increasing complexity.
The problems are fairly standard ones from the OR world. These kinds of problems are NP-hard, so determining solutions eventually take exponential time as the problem size increases, but as covered in the lectures, the goal is to push that curve out as far to the right as possible.
- Knapsack – given a set of items with a given value and given weight, select the set of items to place into a weight-limited knapsack which maximizes total value
- Graph Coloring – use as few colors as possible to color a map, such that no countries with a shared border have the same color
- Traveling Salesman – find the shortest distance route which visits each location once
- Facility Location – the are a set of possible factories, which can produce a given number of widgets but cost a certain amount to open (so some factories will likely remain closed). Then there is a set of customers, each with a given widget demand. These customers and factories are located in a 2D space. The goal is to minimize the total cost = set up cost of all open factories + distance for each customer and the factory it is served by.
- Vehicle Routing – a set of vehicles start at a warehouse. Each has a fixed capacity. There is a set of customers at various locations relative to the warehouse. Each has a fixed demand. Vehicle routes must be determined such that every customer is serviced and total vehicle distance traveled is minimized. (Kind of a combo of knapsack and traveling salesman.)
One bad part of this course is that there wasn’t a lot of guidance on actually implementing some of the ideas covered in the lectures. It’s pretty much a free-for-all. I did do as recommended and started out with a simple greedy algorithm to start each assignment. For instance, for knapsack “take greatest value density items first” actually turned out to be a good strategy. I was pretty surprised actually that these simple greedy algorithms, for most of the problems, came up with an 70% – 80% solution. (by my estimate; the scoring for the class was a bit tougher than that: 3 points for what I would call a 75% solution, 7 points for a 90% solution, and 10 points for 95%+ solution) But I guess that’s what separates the men from the boys in OR – a decent solution is ok (probably good enough for most purposes) but it isn’t OPTIMAL.
So, after my greedy solution attempts, I elected to integrate Google’s or-tools. I thought Google would be a sure thing, but I wasn’t totally satisfied in the end. All my issues have to do with the subpar documentation. There actually is what appears to be pretty good documentation, but this is an illusion — much of it is incomplete stubs and virtually nothing is specific to the Python wrapper (around a C++ implementation … so I couldn’t easily examine code.)
So I didn’t really have any good direction on how to use the library, except for a large selection of excellent examples, mainly from a guy called hakank, that are included in the or-tools distribution. Actually there are example implementations of several of the assigned problems available; I got decent scores by making these work with the course data formats. But not quite 10 point solutions, and not always 7 point solutions … I blame on lack of tuning, since, again, I did not have much visibility into what the or-tools algorithms were doing.
I was also probably hampered by some Python rustiness, such as forgetting about python’s mutable list behavior. I’m still using Pyscripter and think it is fine (good debugger is the key for me!) but I still haven’t got a great feel on the best Python data structures to use in different situations.
Finally, like I said above, I only focused on constraint programming which was probably not always the correct tool for the job. In the end I got 223 points out of 320; the requirement for a certificate was 224. Yes, I was one point off. (Reminds me of the time I missed 3rd place in the state Academic Decathlon meet by something like 30 points out of 10000 … but that’s another story.) I realize the certificates are kind of meaningless, but having a deadline and a goal gets me motivated and I was pretty bummed to miss out. I really want to continue poking at some of these (and related) problems – pretty interesting stuff.