Words of the Prophets

The General Conference of the Church of Jesus Christ of Latter-Day Saints (Mormons) is held twice a year, in April and October.  General Authorities of the Church, including Prophets and Apostles, speak to the church membership about doctrinal issues and give other counsel.

During the last conference, I wondered: Is there a pattern to what is taught in General Conference?  Maybe Python can help us find out….

Methodology

This project involves screen scraping lds.org’s Ensign archives and then using a concordance (of sorts) to do some analysis for word counts and word usage frequency.  My thinking is, the more a certain word is used, the more the general authorities are giving counsel about a particular topic.  An index of all General Conference talks is also created.

The church magazine “Ensign” prints the full text of General Conference in the May and November issues each year.  Luckily, the Ensign is available online at lds.org.  So here’s what we’ll do:

  1. Screen scrape the General Conference Ensign articles
  2. Count up the words in each article and generate word count summaries for each General Conference
  3. Profit!  Er, I mean, make some charts and stuff.

Note: Each October (at least for the last several years) there is a General Relief Society meeting held the week prior to General Conference.  The proceedings of this meeting are included in the November Ensign.  So they are included in this project as well.  Whether or not the meeting is a part of General Conference or not is a matter of debate I guess, but the speakers are General Authorities, so surely belong in this analysis.

Note #2: Only General Conferences back to 1974 are available online.  The online Ensign archives go back further, but prior to 1974 there was a separate “Conference Report” for the proceedings of General Conference, which is not available online as far as I can tell.  So all my results are 1974 – 2010.

Note #3: I didn’t include stuff in the Ensign that did not list an author.  This cuts out stuff like the “Sustaining of Church Officers” and the “Statistical Report”, etc.

Note #4: The Ensign articles use Unicode, which I had some headaches parsing.  So I ended up throwing out everything but the Ascii character set.  Therefore the resulting titles and words might occasionally be incorrect – mainly missing punctuation.  But it’s generally ok.

Results

It took about an hour and a half to download and parse all the General Conference articles!  There are two output .csv files:

GenConfArticleSummary1974to2010.csv – index of LDS General Conference talks, 1974 – 2010.  Lists speaker, title, Ensign month, year and page number, word count, unique word count, unique word ratio (unique count / word count), and top 100 words for each conference talk.

WordsOfProphets1974to2010.csv – lists all the unique words found across all General Conference talks.  Gives the total count for each word.  For each General Conference, the percentage contribution of each unique word to the Conference’s total is given.  Ie, “0.389497” for the word “church” for “May1974” means that in the April 1974 General Conference, 0.38% of the words spoken (well, scraped from the Ensign webpage) were the word “church.”

Some stats (1974 – 2010 General Conference):

  • 2,713 talks
  • 397 different speakers (176 of those gave just a single talk)
  • 5,181,241 total words
  • 51,274 unique words (0.98% of total)

A note on the “unique word ratio” ( = 100 * unique words / total words) : I’ve noticed this generally tends to decrease the longer the body of text is.  So it probably is only meaningful (although what the meaning is I do not know) to compare for texts of about the same size.

The next chart shows the top 20 General Conference speakers who gave the most talks (“Count of Title”).  Average total word count and the average unique word ratio are also shown.  Gordon B. Hinckley is tops, no surprise.  He was in the First Presidency (generally 2 talks per Conference) or was the Prophet (about 4 talks per Conference – usually gives a welcome and a goodbye talk in addition to 2 meatier ones) for much of the time period under consideration (1974 – 2010).  The same can be said about his successor and next most frequent General Conference speaker, Thomas S. Monson.  (This is only the top 20; see GenConfArticleSummary1974to2010.csv and sort in Excel or OpenOffice to get the full list.)

Speaker Count of Title Average of Word_Count Average of Unique Ratio
Gordon B. Hinckley 207 2065.545894 34.51755811
Thomas S. Monson 162 2189.833333 36.73315815
James E. Faust 98 2286.030612 33.84414864
L. Tom Perry 75 2113.973333 32.77750744
Boyd K. Packer 75 2276.106667 31.69806946
Spencer W. Kimball 66 2264.893939 34.32113774
M. Russell Ballard 62 2113.903226 32.98976187
Ezra Taft Benson 57 2153.578947 31.79958897
Russell M. Nelson 56 2328.535714 34.46947778
David B. Haight 55 2022.8 33.50270983
Dallin H. Oaks 54 2459.018519 30.85206586
Neal A. Maxwell 53 1965.773585 39.94132846
Joseph B. Wirthlin 53 2259.528302 33.46916535
Richard G. Scott 51 1934.647059 34.08122863
Marion G. Romney 51 2157.607843 30.11546943
Robert D. Hales 46 2255.086957 30.38011966
Henry B. Eyring 46 2437.673913 26.86417891
Howard W. Hunter 45 1676.977778 34.74803859
Marvin J. Ashton 38 2248.263158 34.31398562
N. Eldon Tanner 36 2428.833333 31.85897606

Something else interesting regarding the “unique word ratio”.  Neal A. Maxwell’s is particularly high at 39.9%.  This is somewhat expected; Elder Maxwell was somewhat renowned for eloquence and a large vocabulary.  Surprisingly, Henry B. Eyring’s unique word ratio is particularly low at 26.8%.  But I wouldn’t call his talks rudimentary or simplistic by any means; quite the opposite.  The relative word count between Maxwell (1965.7) and Eyring (2437.6) may have something to do with these numbers — as I said before, longer data sources tend to have smaller unique word ratios.  But then again, N. Eldon Tanner’s total word count (2428.8) is close to Eyring’s, but Tanner’s unique word ratio is higher, 31.8%.

Now for some individual word analysis – using data in WordsOfProphets1974to2010.csv.  Probably lots of interesting stuff that could be done with this data, but for now we’ll just look at some Excel charts plotting the word usage frequency for some “interesting” words.

“Constitution” and “Pioneers”

I picked these first because I was fairly certain where there would be big spikes.  As expected, “Constitution” gets used more frequently in 1976 and 1987 — the bicentennial of the Declaration of Independence and the Constitution, respectively.  “Pioneers” gets a big spike in 1997 – the sesquicentennial of Mormon pioneers arriving in the Salt Lake valley in 1847.

I should note that the y-axis is a %.  For example, about 0.11% of the words scraped from talks in the May 1997 Ensign were the word “pioneers”.

“Internet” and “Pornography”

The internet isn’t mentioned at all til 1996, about the time it started becoming popular and mainstream.  The counsel from Church leaders about the evils of pornography seems to have increased, on average, in the years since the internet became more common and the problem of internet pornography more pervasive.

“Faith,” “Jesus,” “Christ,” “Savior,” “Lord”

Definite upward trend in the use of the word “faith”.  I guess that’s good.

I plotted both “Jesus” and “Christ” because while they are usually used together, when they are used separately it seems like “Christ” is used more often than “Jesus” … at least in recent years (see about 2004-2010).  During the 1970’s, the opposite appears slightly true – “Jesus” used more frequently than “Christ”.

Both “Jesus” and “Christ” steadily increase from 2006 to 2008, then abruptly plummet in November 2008 and stay at about the same level til 2010.  I have no good explanation for this; it seems intriguing.  President Hinckley passed away in early 2008 and President Monson became the new prophet — since the prophet speaks more in General Conference than anyone else, maybe Monson uses other words like “Savior” or “Lord” more frequently?

Perhaps.  There is an uptick for “Lord” and Savior” starting in May 2010.

“Tithing” and “Prayer”

Hypothesis: more talk about prayer during economic hard times, and less about tithing?

Here’s a GDP per capita chart, from http://www.measuringworth.com/usgdp/.  We clearly see four recessions: early 1980’s, early 1990’s, early 2000’s (hey is there a ten year pattern here?) and 2008-present.

So how to “prayer” and “tithing” General Conference frequencies compare during the recessions?

prayer tithing
Early 1980’s down up
Early 1990’s up down
Early 2000’s up down
2008 – now up down


Hypothesis confirmed?  Well, it’s kind of inconclusive – pretty shallow data set.  But this type of analysis would be very interesting to pursue, methinks.  Not necessarily only about economic issues (although that type of stuff is very easy to get historical #s for).

Files

Here are the Python scripts used to extract the data in the .csv’s.

concordance.py – Upgraded somewhat from my previous post on concordances.  Now it is in a class and can be called with any block of text input, not just a file.

getEnsignData.py – Starts at the Ensign archive webpage and dives down into the year, month, and article pages.  Calls appropriate scraper routines from ldsscraper.py for each.  Pickles output to ensignData.txt and genConfData.txt.  This is so I can split the project up into two pieces – download the data (which takes a very long time) and save it; then assemble it for output later.

ldsscraper.py – Uses regular expressions and Beautiful Soup to parse each webpage.  Since each type of page (Archive, Year, Table of Contents (individual month issue), Article) has it’s own format, each has it’s own scraper.

scraper.py – Base class for scrapers in ldsscraper.py.  Uses mechanize to download a webpage.  Also (optionally) strips out anything but the Ascii character set.

wordsOfProphets.py – Run after getEnsignData.py.  Loads up ensignData.txt and genConfData.txt and creates the two .csv files, GenConfArticleSummary1974to2010.csv and WordsOfProphets1974to2010.csv.

Advertisements

2 responses

  1. Well would you look at that … BYU has a corpus of all the General Conference talks back to 1851, and you can see word frequency and such. http://corpus.byu.edu/gc/

  2. For future reference:
    If you feel like being creative, you can use/abuse http://scriptures.byu.edu/ ajax/javascript interface to grab pretty much whatever you want as far as scriptures, conference talks, or the journal of discourse is concerned. You can even piggyback off their searching abilities. I’d have to dig up some old code I wrote to tell you exactly how, but the gist of it involves hand-making cookies then accessing http://scriptures.byu.edu/citations.php. (For some reason they use cookies to determine your search settings. No, I don’t know why either.)

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: