The General Conference of the Church of Jesus Christ of Latter-Day Saints (Mormons) is held twice a year, in April and October. General Authorities of the Church, including Prophets and Apostles, speak to the church membership about doctrinal issues and give other counsel.
During the last conference, I wondered: Is there a pattern to what is taught in General Conference? Maybe Python can help us find out….
This project involves screen scraping lds.org’s Ensign archives and then using a concordance (of sorts) to do some analysis for word counts and word usage frequency. My thinking is, the more a certain word is used, the more the general authorities are giving counsel about a particular topic. An index of all General Conference talks is also created.
The church magazine “Ensign” prints the full text of General Conference in the May and November issues each year. Luckily, the Ensign is available online at lds.org. So here’s what we’ll do:
- Screen scrape the General Conference Ensign articles
- Count up the words in each article and generate word count summaries for each General Conference
- Profit! Er, I mean, make some charts and stuff.
Note: Each October (at least for the last several years) there is a General Relief Society meeting held the week prior to General Conference. The proceedings of this meeting are included in the November Ensign. So they are included in this project as well. Whether or not the meeting is a part of General Conference or not is a matter of debate I guess, but the speakers are General Authorities, so surely belong in this analysis.
Note #2: Only General Conferences back to 1974 are available online. The online Ensign archives go back further, but prior to 1974 there was a separate “Conference Report” for the proceedings of General Conference, which is not available online as far as I can tell. So all my results are 1974 – 2010.
Note #3: I didn’t include stuff in the Ensign that did not list an author. This cuts out stuff like the “Sustaining of Church Officers” and the “Statistical Report”, etc.
Note #4: The Ensign articles use Unicode, which I had some headaches parsing. So I ended up throwing out everything but the Ascii character set. Therefore the resulting titles and words might occasionally be incorrect – mainly missing punctuation. But it’s generally ok.
It took about an hour and a half to download and parse all the General Conference articles! There are two output .csv files:
GenConfArticleSummary1974to2010.csv – index of LDS General Conference talks, 1974 – 2010. Lists speaker, title, Ensign month, year and page number, word count, unique word count, unique word ratio (unique count / word count), and top 100 words for each conference talk.
WordsOfProphets1974to2010.csv – lists all the unique words found across all General Conference talks. Gives the total count for each word. For each General Conference, the percentage contribution of each unique word to the Conference’s total is given. Ie, “0.389497” for the word “church” for “May1974” means that in the April 1974 General Conference, 0.38% of the words spoken (well, scraped from the Ensign webpage) were the word “church.”
Some stats (1974 – 2010 General Conference):
- 2,713 talks
- 397 different speakers (176 of those gave just a single talk)
- 5,181,241 total words
- 51,274 unique words (0.98% of total)
A note on the “unique word ratio” ( = 100 * unique words / total words) : I’ve noticed this generally tends to decrease the longer the body of text is. So it probably is only meaningful (although what the meaning is I do not know) to compare for texts of about the same size.
The next chart shows the top 20 General Conference speakers who gave the most talks (“Count of Title”). Average total word count and the average unique word ratio are also shown. Gordon B. Hinckley is tops, no surprise. He was in the First Presidency (generally 2 talks per Conference) or was the Prophet (about 4 talks per Conference – usually gives a welcome and a goodbye talk in addition to 2 meatier ones) for much of the time period under consideration (1974 – 2010). The same can be said about his successor and next most frequent General Conference speaker, Thomas S. Monson. (This is only the top 20; see GenConfArticleSummary1974to2010.csv and sort in Excel or OpenOffice to get the full list.)
|Speaker||Count of Title||Average of Word_Count||Average of Unique Ratio|
|Gordon B. Hinckley||207||2065.545894||34.51755811|
|Thomas S. Monson||162||2189.833333||36.73315815|
|James E. Faust||98||2286.030612||33.84414864|
|L. Tom Perry||75||2113.973333||32.77750744|
|Boyd K. Packer||75||2276.106667||31.69806946|
|Spencer W. Kimball||66||2264.893939||34.32113774|
|M. Russell Ballard||62||2113.903226||32.98976187|
|Ezra Taft Benson||57||2153.578947||31.79958897|
|Russell M. Nelson||56||2328.535714||34.46947778|
|David B. Haight||55||2022.8||33.50270983|
|Dallin H. Oaks||54||2459.018519||30.85206586|
|Neal A. Maxwell||53||1965.773585||39.94132846|
|Joseph B. Wirthlin||53||2259.528302||33.46916535|
|Richard G. Scott||51||1934.647059||34.08122863|
|Marion G. Romney||51||2157.607843||30.11546943|
|Robert D. Hales||46||2255.086957||30.38011966|
|Henry B. Eyring||46||2437.673913||26.86417891|
|Howard W. Hunter||45||1676.977778||34.74803859|
|Marvin J. Ashton||38||2248.263158||34.31398562|
|N. Eldon Tanner||36||2428.833333||31.85897606|
Something else interesting regarding the “unique word ratio”. Neal A. Maxwell’s is particularly high at 39.9%. This is somewhat expected; Elder Maxwell was somewhat renowned for eloquence and a large vocabulary. Surprisingly, Henry B. Eyring’s unique word ratio is particularly low at 26.8%. But I wouldn’t call his talks rudimentary or simplistic by any means; quite the opposite. The relative word count between Maxwell (1965.7) and Eyring (2437.6) may have something to do with these numbers — as I said before, longer data sources tend to have smaller unique word ratios. But then again, N. Eldon Tanner’s total word count (2428.8) is close to Eyring’s, but Tanner’s unique word ratio is higher, 31.8%.
Now for some individual word analysis – using data in WordsOfProphets1974to2010.csv. Probably lots of interesting stuff that could be done with this data, but for now we’ll just look at some Excel charts plotting the word usage frequency for some “interesting” words.
“Constitution” and “Pioneers”
I picked these first because I was fairly certain where there would be big spikes. As expected, “Constitution” gets used more frequently in 1976 and 1987 — the bicentennial of the Declaration of Independence and the Constitution, respectively. “Pioneers” gets a big spike in 1997 – the sesquicentennial of Mormon pioneers arriving in the Salt Lake valley in 1847.
I should note that the y-axis is a %. For example, about 0.11% of the words scraped from talks in the May 1997 Ensign were the word “pioneers”.
“Internet” and “Pornography”
The internet isn’t mentioned at all til 1996, about the time it started becoming popular and mainstream. The counsel from Church leaders about the evils of pornography seems to have increased, on average, in the years since the internet became more common and the problem of internet pornography more pervasive.
“Faith,” “Jesus,” “Christ,” “Savior,” “Lord”
I plotted both “Jesus” and “Christ” because while they are usually used together, when they are used separately it seems like “Christ” is used more often than “Jesus” … at least in recent years (see about 2004-2010). During the 1970’s, the opposite appears slightly true – “Jesus” used more frequently than “Christ”.
Both “Jesus” and “Christ” steadily increase from 2006 to 2008, then abruptly plummet in November 2008 and stay at about the same level til 2010. I have no good explanation for this; it seems intriguing. President Hinckley passed away in early 2008 and President Monson became the new prophet — since the prophet speaks more in General Conference than anyone else, maybe Monson uses other words like “Savior” or “Lord” more frequently?
“Tithing” and “Prayer”
Here’s a GDP per capita chart, from http://www.measuringworth.com/usgdp/. We clearly see four recessions: early 1980’s, early 1990’s, early 2000’s (hey is there a ten year pattern here?) and 2008-present.
So how to “prayer” and “tithing” General Conference frequencies compare during the recessions?
|2008 – now||up||down|
Hypothesis confirmed? Well, it’s kind of inconclusive – pretty shallow data set. But this type of analysis would be very interesting to pursue, methinks. Not necessarily only about economic issues (although that type of stuff is very easy to get historical #s for).
Here are the Python scripts used to extract the data in the .csv’s.
getEnsignData.py – Starts at the Ensign archive webpage and dives down into the year, month, and article pages. Calls appropriate scraper routines from ldsscraper.py for each. Pickles output to ensignData.txt and genConfData.txt. This is so I can split the project up into two pieces – download the data (which takes a very long time) and save it; then assemble it for output later.
ldsscraper.py – Uses regular expressions and Beautiful Soup to parse each webpage. Since each type of page (Archive, Year, Table of Contents (individual month issue), Article) has it’s own format, each has it’s own scraper.