Tag Archives: Python

edX: Introduction to Computer Science and Programming Using Python (MIT)

Actually finished up this course a while ago, but haven’t written it up yet.  I felt a little vindicated by how easy this was, especially after getting hammered with the Discrete Optimization course I took last.  I barely watched any of the videos; really just did the programming assignments.

  • One on string manipulation, finding vowels or substrings
  • A debt/min payment equation solver
  • Hangman game
  • Scrabble-like game
  • Caesar cipher coding and decoding
  • Parsing an RSS feed for stories matching desired keywords, etc.

Everything was pretty much fill-in-the-blank.  Decent auto-grader, as seems to be the norm for these types of online programming classes.

Besides being a good Python review, I did learn more about properly working with dictionaries:

Remember that with a dictionary, the usual way to access a value is hand['a'], where 'a' is the key we want to find. However, this only works if the key is in the dictionary; otherwise, we get a KeyError. To avoid this, we can use the call hand.get('a',0). This is the “safe” way to access a value if we are not sure the key is in the dictionary. d.get(key,default) returns the value for key if key is in the dictionary d, else default.



Discrete Optimization on Coursera

I recently finished the Discrete Optimization course on Coursera, taught by Pascal Van Hentenryck.  I thought it was really good!  Van Hentenryck is pretty crazy in the first set of videos, pretending to be Indiana Jones.  He gets quite a bit tamer later on, but still good material.

The course covers three main optimization techniques:

  • Constraint Programming – “Constraint Programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it.” – Freuder, 1997
  • Local Search – define a set of moves (neighborhood), start somewhere, then make a move.  Usually move to decrease objective function but there is danger of local minima.  So do stuff like simulated annealing or tabu search to break out.
  • Linear and Integer Programming – Linear program solution is a vertex of polytope defined by intersection of constraints.  Simplex algorithm is a way to explore vertices.  The nice thing about linear programming is that it is easy to show optimality.

I was most intrigued by constraint programming due to some applications to a constrained scheduling problem at my job, so I focused on CP solutions to the problems … probably why I didn’t score as well as I would have liked; I think part of the point of the course is that no single hammer is sufficient to optimally drive all these nails.

The programming assignments were the best part of the course.  For each one, a skeleton Python script is provided which reads in data files and performs some simple “dummy” approach.  The student’s job is to implement something better…there’s no real rules on how you get there.  You can even enlist the aid of external solver libraries, which is what I did (see below).  There is an auto-grader which will test your solution against 6 or so data sets of increasing complexity.

The problems are fairly standard ones from the OR world.  These kinds of problems are NP-hard, so determining solutions eventually take exponential time as the problem size increases, but as covered in the lectures, the goal is to push that curve out as far to the right as possible.

  1. Knapsack – given a set of items with a given value and given weight, select the set of items to place into a weight-limited knapsack which maximizes total value
  2. Graph Coloring – use as few colors as possible to color a map, such that no countries with a shared border have the same color
  3. Traveling Salesman – find the shortest distance route which visits each location once
  4. Facility Location – the are a set of possible factories, which can produce a given number of widgets but cost a certain amount to open (so some factories will likely remain closed).  Then there is a set of customers, each with a given widget demand.  These customers and factories are located in a 2D space.  The goal is to minimize the total cost = set up cost of all open factories + distance for each customer and the factory it is served by.
  5. Vehicle Routing – a set of vehicles start at a warehouse.  Each has a fixed capacity.  There is a set of customers at various locations relative to the warehouse.  Each has a fixed demand.  Vehicle routes must be determined such that every customer is serviced and total vehicle distance traveled is minimized.  (Kind of a combo of knapsack and traveling salesman.)

One bad part of this course is that there wasn’t a lot of guidance on actually implementing some of the ideas covered in the lectures.  It’s pretty much a free-for-all.  I did do as recommended and started out with a simple greedy algorithm to start each assignment.  For instance, for knapsack “take greatest value density items first” actually turned out to be a good strategy.  I was pretty surprised actually that these simple greedy algorithms, for most of the problems, came up with an 70% – 80% solution.  (by my estimate; the scoring for the class was a bit tougher than that: 3 points for what I would call a 75% solution, 7 points for a 90% solution, and 10 points for 95%+ solution)  But I guess that’s what separates the men from the boys in OR – a decent solution is ok (probably good enough for most purposes) but it isn’t OPTIMAL.

So, after my greedy solution attempts, I elected to integrate Google’s or-tools.  I thought Google would be a sure thing, but I wasn’t totally satisfied in the end.  All my issues have to do with the subpar documentation.  There actually is what appears to be pretty good documentation, but this is an illusion — much of it is incomplete stubs and virtually nothing is specific to the Python wrapper (around a C++ implementation … so I couldn’t easily examine code.)

So I didn’t really have any good direction on how to use the library, except for a large selection of excellent examples, mainly from a guy called hakank, that are included in the or-tools distribution.  Actually there are example implementations of several of the assigned problems available; I got decent scores by making these work with the course data formats.  But not quite 10 point solutions, and not always 7 point solutions … I blame on lack of tuning, since, again, I did not have much visibility into what the or-tools algorithms were doing.

I was also probably hampered by some Python rustiness, such as forgetting about python’s mutable list behavior.  I’m still using Pyscripter and think it is fine (good debugger is the key for me!) but I still haven’t got a great feel on the best Python data structures to use in different situations.

Finally, like I said above, I only focused on constraint programming which was probably not always the correct tool for the job.  In the end I got 223 points out of 320; the requirement for a certificate was 224.  Yes, I was one point off.  (Reminds me of the time I missed 3rd place in the state Academic Decathlon meet by something like 30 points out of 10000 … but that’s another story.)  I realize the certificates are kind of meaningless, but having a deadline and a goal gets me motivated and I was pretty bummed to miss out. I really want to continue poking at some of these (and related) problems – pretty interesting stuff.

Adding date photo taken to filename

Digital photo organization is a pain.  Everybody has their own scheme (although I suspect many have none at all).  Here’s mine:

  1. Keep all pictures in one directory.  Yup, no subdirectories at all.
  2. When downloading files from camera’s SD card, batch rename (Win XP – select group of files, then ‘F2’ or right-click and ‘Rename’; Picasa also has a batch rename function) files according to the event/place/whatever.  You’ll end up with ‘Grandpa’s Birthday.jpg’, ‘Grandpa’s Birthday (1).jpg’, ‘Grandpa’s Birthday (2).jpg’, …
  3. Append the date the picture was taken to the front of the filename.  That way, the files will display in chronological order.

My Python script jpg_batch_rename.py (uses EXIF.py) does #3 for me.  For all files in a chosen directory, it extracts the date a picture was taken from the jpg exif datetime field.  If the file is not a jpg or if exif datetime info is not present for some reason, then it will extract the date the file was last modified.  The resulting date is appended to the front of each filename.  I don’t want to change the “file last modified” date itself, so at the end I reset it to what it was before the file rename.

Besides EXIF.py, the rest of the magic is done primarily with the Python os module.

For added usability, jpg_batch_rename can be used in conjunction with image_viewer2_SRO.py.  I adapted a script from the The Mouse vs The Python blog that uses wxPython to put the jpg_batch_rename functionality in a GUI.

After loading a directory with the button in the upper left, the first picture will be displayed.  (If there aren’t any pictures in the directory then it will crash, sorry.)  “Previous” and “Next” allow you to scroll through the photos – left over from the script I stol- er, borrowed.  The “Add Date to Filename” calls jpg_batch_rename and will append the file’s date to the front of all the files in the directory – even non picture files.  Checking the “Make backup” button will copy the original files to a “backup” subdirectory before making any changes.  The field at the bottom allows you to add an optional descriptor to add to the file name, as well as the date.



Words of the Prophets

The General Conference of the Church of Jesus Christ of Latter-Day Saints (Mormons) is held twice a year, in April and October.  General Authorities of the Church, including Prophets and Apostles, speak to the church membership about doctrinal issues and give other counsel.

During the last conference, I wondered: Is there a pattern to what is taught in General Conference?  Maybe Python can help us find out….


This project involves screen scraping lds.org’s Ensign archives and then using a concordance (of sorts) to do some analysis for word counts and word usage frequency.  My thinking is, the more a certain word is used, the more the general authorities are giving counsel about a particular topic.  An index of all General Conference talks is also created.

The church magazine “Ensign” prints the full text of General Conference in the May and November issues each year.  Luckily, the Ensign is available online at lds.org.  So here’s what we’ll do:

  1. Screen scrape the General Conference Ensign articles
  2. Count up the words in each article and generate word count summaries for each General Conference
  3. Profit!  Er, I mean, make some charts and stuff.

Note: Each October (at least for the last several years) there is a General Relief Society meeting held the week prior to General Conference.  The proceedings of this meeting are included in the November Ensign.  So they are included in this project as well.  Whether or not the meeting is a part of General Conference or not is a matter of debate I guess, but the speakers are General Authorities, so surely belong in this analysis.

Note #2: Only General Conferences back to 1974 are available online.  The online Ensign archives go back further, but prior to 1974 there was a separate “Conference Report” for the proceedings of General Conference, which is not available online as far as I can tell.  So all my results are 1974 – 2010.

Note #3: I didn’t include stuff in the Ensign that did not list an author.  This cuts out stuff like the “Sustaining of Church Officers” and the “Statistical Report”, etc.

Note #4: The Ensign articles use Unicode, which I had some headaches parsing.  So I ended up throwing out everything but the Ascii character set.  Therefore the resulting titles and words might occasionally be incorrect – mainly missing punctuation.  But it’s generally ok.


It took about an hour and a half to download and parse all the General Conference articles!  There are two output .csv files:

GenConfArticleSummary1974to2010.csv – index of LDS General Conference talks, 1974 – 2010.  Lists speaker, title, Ensign month, year and page number, word count, unique word count, unique word ratio (unique count / word count), and top 100 words for each conference talk.

WordsOfProphets1974to2010.csv – lists all the unique words found across all General Conference talks.  Gives the total count for each word.  For each General Conference, the percentage contribution of each unique word to the Conference’s total is given.  Ie, “0.389497” for the word “church” for “May1974” means that in the April 1974 General Conference, 0.38% of the words spoken (well, scraped from the Ensign webpage) were the word “church.”

Some stats (1974 – 2010 General Conference):

  • 2,713 talks
  • 397 different speakers (176 of those gave just a single talk)
  • 5,181,241 total words
  • 51,274 unique words (0.98% of total)

A note on the “unique word ratio” ( = 100 * unique words / total words) : I’ve noticed this generally tends to decrease the longer the body of text is.  So it probably is only meaningful (although what the meaning is I do not know) to compare for texts of about the same size.

The next chart shows the top 20 General Conference speakers who gave the most talks (“Count of Title”).  Average total word count and the average unique word ratio are also shown.  Gordon B. Hinckley is tops, no surprise.  He was in the First Presidency (generally 2 talks per Conference) or was the Prophet (about 4 talks per Conference – usually gives a welcome and a goodbye talk in addition to 2 meatier ones) for much of the time period under consideration (1974 – 2010).  The same can be said about his successor and next most frequent General Conference speaker, Thomas S. Monson.  (This is only the top 20; see GenConfArticleSummary1974to2010.csv and sort in Excel or OpenOffice to get the full list.)

Speaker Count of Title Average of Word_Count Average of Unique Ratio
Gordon B. Hinckley 207 2065.545894 34.51755811
Thomas S. Monson 162 2189.833333 36.73315815
James E. Faust 98 2286.030612 33.84414864
L. Tom Perry 75 2113.973333 32.77750744
Boyd K. Packer 75 2276.106667 31.69806946
Spencer W. Kimball 66 2264.893939 34.32113774
M. Russell Ballard 62 2113.903226 32.98976187
Ezra Taft Benson 57 2153.578947 31.79958897
Russell M. Nelson 56 2328.535714 34.46947778
David B. Haight 55 2022.8 33.50270983
Dallin H. Oaks 54 2459.018519 30.85206586
Neal A. Maxwell 53 1965.773585 39.94132846
Joseph B. Wirthlin 53 2259.528302 33.46916535
Richard G. Scott 51 1934.647059 34.08122863
Marion G. Romney 51 2157.607843 30.11546943
Robert D. Hales 46 2255.086957 30.38011966
Henry B. Eyring 46 2437.673913 26.86417891
Howard W. Hunter 45 1676.977778 34.74803859
Marvin J. Ashton 38 2248.263158 34.31398562
N. Eldon Tanner 36 2428.833333 31.85897606

Something else interesting regarding the “unique word ratio”.  Neal A. Maxwell’s is particularly high at 39.9%.  This is somewhat expected; Elder Maxwell was somewhat renowned for eloquence and a large vocabulary.  Surprisingly, Henry B. Eyring’s unique word ratio is particularly low at 26.8%.  But I wouldn’t call his talks rudimentary or simplistic by any means; quite the opposite.  The relative word count between Maxwell (1965.7) and Eyring (2437.6) may have something to do with these numbers — as I said before, longer data sources tend to have smaller unique word ratios.  But then again, N. Eldon Tanner’s total word count (2428.8) is close to Eyring’s, but Tanner’s unique word ratio is higher, 31.8%.

Now for some individual word analysis – using data in WordsOfProphets1974to2010.csv.  Probably lots of interesting stuff that could be done with this data, but for now we’ll just look at some Excel charts plotting the word usage frequency for some “interesting” words.

“Constitution” and “Pioneers”

I picked these first because I was fairly certain where there would be big spikes.  As expected, “Constitution” gets used more frequently in 1976 and 1987 — the bicentennial of the Declaration of Independence and the Constitution, respectively.  “Pioneers” gets a big spike in 1997 – the sesquicentennial of Mormon pioneers arriving in the Salt Lake valley in 1847.

I should note that the y-axis is a %.  For example, about 0.11% of the words scraped from talks in the May 1997 Ensign were the word “pioneers”.

“Internet” and “Pornography”

The internet isn’t mentioned at all til 1996, about the time it started becoming popular and mainstream.  The counsel from Church leaders about the evils of pornography seems to have increased, on average, in the years since the internet became more common and the problem of internet pornography more pervasive.

“Faith,” “Jesus,” “Christ,” “Savior,” “Lord”

Definite upward trend in the use of the word “faith”.  I guess that’s good.

I plotted both “Jesus” and “Christ” because while they are usually used together, when they are used separately it seems like “Christ” is used more often than “Jesus” … at least in recent years (see about 2004-2010).  During the 1970’s, the opposite appears slightly true – “Jesus” used more frequently than “Christ”.

Both “Jesus” and “Christ” steadily increase from 2006 to 2008, then abruptly plummet in November 2008 and stay at about the same level til 2010.  I have no good explanation for this; it seems intriguing.  President Hinckley passed away in early 2008 and President Monson became the new prophet — since the prophet speaks more in General Conference than anyone else, maybe Monson uses other words like “Savior” or “Lord” more frequently?

Perhaps.  There is an uptick for “Lord” and Savior” starting in May 2010.

“Tithing” and “Prayer”

Hypothesis: more talk about prayer during economic hard times, and less about tithing?

Here’s a GDP per capita chart, from http://www.measuringworth.com/usgdp/.  We clearly see four recessions: early 1980’s, early 1990’s, early 2000’s (hey is there a ten year pattern here?) and 2008-present.

So how to “prayer” and “tithing” General Conference frequencies compare during the recessions?

prayer tithing
Early 1980’s down up
Early 1990’s up down
Early 2000’s up down
2008 – now up down

Hypothesis confirmed?  Well, it’s kind of inconclusive – pretty shallow data set.  But this type of analysis would be very interesting to pursue, methinks.  Not necessarily only about economic issues (although that type of stuff is very easy to get historical #s for).


Here are the Python scripts used to extract the data in the .csv’s.

concordance.py – Upgraded somewhat from my previous post on concordances.  Now it is in a class and can be called with any block of text input, not just a file.

getEnsignData.py – Starts at the Ensign archive webpage and dives down into the year, month, and article pages.  Calls appropriate scraper routines from ldsscraper.py for each.  Pickles output to ensignData.txt and genConfData.txt.  This is so I can split the project up into two pieces – download the data (which takes a very long time) and save it; then assemble it for output later.

ldsscraper.py – Uses regular expressions and Beautiful Soup to parse each webpage.  Since each type of page (Archive, Year, Table of Contents (individual month issue), Article) has it’s own format, each has it’s own scraper.

scraper.py – Base class for scrapers in ldsscraper.py.  Uses mechanize to download a webpage.  Also (optionally) strips out anything but the Ascii character set.

wordsOfProphets.py – Run after getEnsignData.py.  Loads up ensignData.txt and genConfData.txt and creates the two .csv files, GenConfArticleSummary1974to2010.csv and WordsOfProphets1974to2010.csv.

Counting Unique Words with Python

The impetus for this project is an NPR story from earlier this year that I just now found.  An English literature researcher constructed a concordance of Agatha Christie novels to support the hypothesis that she suffered from Alzheimer’s in the late stages of her career.  He found a 20% drop in her 73rd novel’s vocabulary, as compared with previous novels.

A concordance is an index of all the instances of a particular word in a body of text.  A concordancer is a program that generates a concordance.  I don’t think I’ve created a true concordancer, because I don’t keep track of where (what page or section of text) I find each word.  I just keep track of a count of unique words.  Call it “Concordancer Junior.”

The concordance.py script will scan an input .txt file and count up the instances of each unique word.  Doing this is almost trivial with Python’s dictionary structure.

def addword(word):
    if theIndex.has_key(word):
        theIndex[word] = theIndex[word] + 1 #increment count
        theIndex[word] = 1 #add word to dictionary with count = 1

Once all the words have been added to the dictionary, I want to display the most frequently found words.  However, Python dictionary’s are not sorted whatsoever, so just printing the first x entries of the dictionary won’t do.  Searching around found a Python wiki entry that has an easy solution using the sorted() function:

s1 = sorted(theIndex.items(),key=lambda item:item[0]) #secondary key: sort alphabetically
s2 = sorted(s1,key=lambda item:item[1], reverse=True) #primary key: sort by count

Ooohhh… lambda functions.  <shiver!>

I should note that sorted() returns a list of (key,value) tuples rather than a sorted dictionary.  But that’s fine since I’m not going to need to add anything further to the dictionary once I get to the sorting point.

For my first cut, the most common words were ‘the’, ‘a’, ‘and’, ‘he’, etc.  Not very interesting.  So, I revised the script to print out the top 100 words found in the file, EXCLUDING the 100 most common (English) words.

For test cases I downloaded plain text files from Project Gutenberg, removing their entry/exit boilerplate stuff.

Without further ado, some results please!

The King James Bible:

Alice in Wonderland:



Huckleberry Finn:

Screen Scraping with Python

I’ve done a few projects now involving screen scraping web data with Python.  Here’s my (for now) preferred method.

Define the Problem

Always a good first programming step!  In this case, the problem is “how do I extract <bit of data> from <website address>?”

For this example, we’ll extract the population density of Jackson County, Missouri, from it’s city-data webpage.

Download the Webpage

There are several python libraries that allow you to download a webpage.  I’ve found mechanize to be the easiest to use.

import mechanize
mech = mechanize.Browser() #mechanize will mimic a web browser - web servers are none the wiser.
url = "http://www.city-data.com/county/Jackson_County-MO.html" #an example URL from my last project.  Contains county data.
response = mech.open(url)
page = response.read() #read() returns the url's html code as one big string

If you are behind a firewall or something and need to access the internet through a proxy server, then before calling mechanize’s open function, you need to set up the proxy.


Extract the Desired Data with Regular Expressions

Now we have the webpage in html format.  It’s just a big ol’ string.  Here’s a snippet:

<table border="0" cellpadding="0" cellspacing="0"><tr><td>Population density: 1167 people per square mile&nbsp;</td><td><div align="left"><table border="2" cellpadding="0" cellspacing="0" width="20" bordercolor="#DDDD00" bgcolor="#e8e8e8"><tr><td>&nbsp;</td></tr></table></div></td> <td>&nbsp;(very high).</td></tr></table>

You can view the html source for a webpage with pretty much any web browser by right-clicking and selecting “View source.”  Looks kind of confusing, eh?

We need to search through the string we downloaded with mechanize for our piece of data.  Luckily, Python has a built-in, well developed regular expression library that works great for this kind of problem.

import re
extractor = re.compile(r'Population density: (.+?) people')
data = re.findall(extractor,page)[0]

“compile()” will set up a regular expression for use in other functions.  The “r” in front of the search string indicates that the string is a regular expression.  The parentheses ( ) indicate the start and end of the regular expression.  “.” matches any character (except a newline), “+” matches 1 or more of the preceding regular expression, and “?” makes the preceding regular expression non-greedy.

“data” is a string that can be printed, converted to a number and used in calculations or whatever further along in your script.

Congratulations!  You screen scraped some data with Python!

Picture Captionator – example of python screen scraping


Here’s an example of screen scraping with Python.  Give this script a list of words, and it will 1) find a picture of the object/person/whatever via Google Image Search, and also 2) generate a caption for the picture by selecting a random sentence from the object’s Wikipedia article.  Note that the captions may not relate directly to the selected picture.  In fact they often do not, to humorous effect.  So there you go, it just became a game.  Or a time waster.  Whatever it is, hopefully it can be a good example of how to do this kind of thing.

Requires Pygame to do the display.

Download: captionator_v0.1.zip

Smart Dots, version 0.2 – Orbit Point, Improved PID Control

In this update, I was confusing myself with (0,0) being in the upper left corner of the screen, so I changed it so that (0,0) is at the center.  No noticeable effect on the actual sim, but makes thinking about angles and things a little easier (for me).  I also implemented the orbit point functionality – pretty easy because it uses the already created MoveToPoint function.  I wanted to make the dots automatically space themselves evenly out along the orbit by speeding up if there is a dot close behind or slowing down if another dot is close in front, but couldn’t get it to work very well, so left that bit commented out.

I also redid MoveToPoint to be a better example of a PID controller.  (I tried to implement the diagram found here, fig 6.7.)  The gains now are set good enough for the default max velocity and acceleration; probably would need retuning if those are changed.

Download: smartdots_v0.2.zip.  Prerequisites are Python and Pygame.  Main script to run is smartdots.py.  Controls are listed in the status area above the dot box; also run with the -h option to see all other command line options.

I’m not really sure where smart dots will go from here.  “Avoid obstacles” is unimplemented, but after that I’m kind of out of ideas.  It’s not really a game… I guess the “game” portion of this project was for me getting the dots to do what I want them to – it was kind of fun watching them move and orbit a point.  (If you turn on orbit point, then move the action point around, there is some interesting behavior as the dots move while rotating…)  But for now “development” on Smart Dots will probably be on hold, unless I can think of somewhere to go with it.

Smart Dots, version 0.1

My latest project is not quite a game.  (Not yet, at least.)  But it is kind of fun to play around with.  Smart Dots is a simple, physics-based simulation of a multi-agent system – each agent is autonomous, has a limited view of the system, and is decentralized.  For Smart Dots, this means that each dot contains its own kinematics equations and behavioral algorithms, each dot only can “see” a limited distance away, and there is no higher level logic controlling what goes on.  (The only thing the higher level UI and “world” classes do is set dot behaviors.)

When the smartdots.py script is run, you will see several dots bouncing around in a window.  They each have their own velocity, assigned randomly at the start of the simulation.  The user can select a dot to see its position (pixel location), velocity (pixels/sec), and acceleration (pixels/sec^2).  There are several inputs the user may enter:

  • Left click on a dot to select it (the selected dot will turn green.)  Then, by pressing F2, the “Search and Follow” behavior is enabled.  The dots will start to follow the selected dot, if they can “see” it (ie, if it is in what I call in the program the “comm range”).  If the selected dot is traveling sufficiently slower than the others, then they kind of dance around it.
  • Right click on any spot of the window to set the “action point” (designated by a yellow point).  Nothing will happen unless F4 is pressed and the “Move to Action Point” behavior is enabled for the dots.  The dots will then move to the action point.
  • p will toggle pausing for the simulation
  • c will toggle a display of the comm range for each dot (displayed as a red circle)
  • s will “scatter” the dots, giving each one a new random velocity
  • the mouse wheel can be used to apply positive or negative gravity
  • Escape will cancel any gravity settings

I have place holders for adding an “Avoid Obstacles” behavior (avoid other dots), and also an “Orbit Action Point” behavior…to be completed in a future version.

Rather than a single screenshot, check out this youtube video I made of Smart Dots:

What is so cool about Smart Dots?  Well, it’s not much right now.  But I think it can be an interesting framework for testing out simple multi-agent algorithms…and just maybe it can be useful in a future game.  We shall see….

Download: smartdots_v0.1.zip.  Prerequisites are Python and Pygame.  Main script to run is smartdots.py.  Controls are listed in the status area above the dot box; also run with the -h option to see all other command line options.

Python Chess v. 0.7 – Lowering Pygame’s CPU Usage

Someone brought to my attention that Python Chess was using up a lot of CPU cycles, even when (actually, especially when) it was just sitting waiting for user input.  I checked on my system and found it was using 30-40% of the CPU time…way too much for such a simple game!

Some googling revealed that I was using pygame’s event loop in a reckless manner.  I was using pygame.event.get() to get user mouse clicks or other input — evidently, this will return a list of events that have queued up since the last time the get() function was called (not sure if there is a limit to the queue size….)  Since the call to get() is basically a while(1) loop, it was being called just as fast as it could, which is pretty unecessary for a game like chess.  Actually, probably for any game this is overkill.  Luckily, there are a few easy solutions.

The option which I used for Python Chess is to use pygame.event.wait() rather than pygame.event.get().  The wait() function will cause the program to go into an idle state (ie. not hogging the processor) until it gets a single event.  This is much more suited for the game of chess, where the user needs time to ponder their move, anyway.  Replacing my event.get() calls to event.wait() makes the CPU usage of Python Chess go down to 0-1% on my system, with occassional spikes during the AI’s turn of up to 10% (these are just rough numbers seen via Windows Task Manager).  Quite an improvement!

Another option for limiting CPU usage is to create a pygame.time.Clock() object, then calling the object’s tick(fps) function every time through your while(1) loop.  This will limit the loop’s execution to fps (frames per second) times per second, allowing it to go idle in between loop processing.  (Note that on slower systems, the desired fps may not be achievable, so it will still use up a healthy chunk of processing time if the value is too high.)  Using time.Clock seems to be more suited to “action” style games, where there is screen movement going on every time through the loop, for instance.

Download: PythonChess_v0.7.zip