How much data do I have and where did it all come from?

Where did the data come from?

The data presented in this project comes from about five different sources. Table 1 shows these five projects and how much data they’ve contributed to the overall dataset, in terms of number of wards, number of sacrament meetings, and number of hymns overall.

Table 1: Breakdown of the Frequency dataset by source

Frequency data, by source
source	wards		sacrament meetings		hymns
source	n	%	n	%	n	%
joey2015	21	0.9%	2,041	6.67%	7,003	6.75%
joey2023	1,257	56.1%	14,764	48.26%	48,183	46.42%
kjerste2015	59	2.6%	401	1.31%	3,753	3.62%
samuel2015	212	9.5%	2,325	7.60%	7,795	7.51%
samuel2017	691	30.8%	11,064	36.16%	37,055	35.70%
Reflects the dataset as of January 2025.

Let me go through each one individually.

Joey’s 2015 collection

I first began this project sometime around 2013. I had the idea to collect as much data as I could from as many wards as I could to answer the question of what hymn is most common and what hymns are the least common.

In my project planning phase, I considered setting up a survey that people could quickly take every week to report their hymns. However, I didn’t think I could get very many people to commit to that kind of consistency. So, I decided that rather than collecting hymns from the next 52 weeks, what if I collected hymns from the past 52 weeks? I figure many wards have a spreadsheet of some sort that they use to keep track of hymns. If not a spreadsheet, then at least copies of old sacrament meeting programs or notes that the Bishopric uses when conducting. All I’d need then is for someone from each ward to send me their data once, rather than many times.

So I started asking around in online spaces where LDS music people might gather, but I soon found out it was going to be harder than I expected to get people to send me their data. After a few weeks I was able to get data from 21 wards, ranging from a few months to several years. My own ward’s clerk was cleaning out old files and sent me years of sacrament meeting programs. Because this data collection happened through 2015. I call this chunk of the overall sample the Joey2015 data.

Table 1 shows that I collected data from 2,042 sacrament meetings from 21 wards. So, not too many wards, but I did get a fair amount of data from each one. However, the issue I ran across was that the characteristics of any one ward would sometimes overwhelm the overall findings because there were so few wards in total. For example, a ward that sent several years’ worth of data had two quirks: they would systematically cycle through all the sacrament hymns, and they would sing I Know That My Redeemer Lives (#136) every fast Sunday. With the larger dataset I now have, those idiosyncrasies were washed out. But, it was enough data to start to see some of the trends I report elsewhere in this blog.

Figure 1 shows how this Joey2015 sample is distributed over time. In this plot, the height of each bar represents how many sacrament meetings I had data from within each month. You can tell that most of the recruitment efforts were done in 2014 since that’s when the most data comes from. There’s a recency effect here: if I put a call out to people in June 2014, I’ll get a lot of data from the weeks and months leading up to June 2014, and less data the further back I go because fewer wards keep records for that long.

Figure 1: Sacrament meetings per month in the Joey 2015 sample

Joey’s 2023 collection

My interest in the project waned after about 2017, partly because I was in graduate school and was starting a family, and partly because I was having a hard time collecting more data. In 2023 though, my interest was revitalized, coincidentally right before the church announced the new hymnal. I figured since I had been working on this for over a decade and since I’ve collected so much data, I might as well get the results out before it all becomes irrelevant. It’s probably too late to send these findings to the church in case they want to use them to help make decisions about the new hymnal.

As I began this blog, I started sharing it and results from it in new online spaces, primarily social media. As I do so, I encourage people to send me their data if they have it. Through these efforts, I actually collected more data than I had in 2015. More wards, more sacrament meetings, and more hymns.

Figure 2 shows the distribution of dates over time for the joey2023 sample. There are three interesting things to note. One, is there’s a large dip in 2020 because of covid. Through the many spreadsheets I’ve been sent, it’s been interesting to see how and when wards returned back to normal meetings. The other major thing is that there’s a general increase in time and the recency effect is quite visible. As I continue advertising the project, more and more people send stuff to me. But, I have gotten some data from as far back as 2009, which is pretty cool.

Figure 2: Sacrament meetings per month in the Joey2023 sample

Of course, the biggest thing to notice in this plot is that I really increased my data solicitation efforts around the time the first batch of new hymns were released, so there’s a huge spike in May and June 2024. I joined music calling–related Facebook groups and started weekly posts on Twitter asking for hymn data. I also realized around then that there are hundreds of wards that still broadcast their sacrament meetings on YouTube, so I have a constant source of new data. Some wards keep those videos up forever while others only a few days, so I have to check back every week to get the latest batch.

Kjerste’s 2015 survey

Around the same time I was beginning my project, but completely independently of me, another hymn stats fan, Kjerste Christensen, began her own project. She ran a weekly hymn survey for a little over a year. Her thinking was that it was important to get a full year in order to get all the holiday and seasonal variation (and I agree with that). The project was mainly for her own curiosity and she never did much with it other than use it for personal use like figuring out what hymns would be common to sing in church. I’ll call this dataset the kjerste2015 collection.

On average, Kjerste had roughly 20 people per week fill out her survey. It’s not clear how many wards contributed to the survey total because many people did not include information about their ward, but there were at least 60 and likely 2–3 times that many. Figure 3 shows the distribution of the kjerste2015 data across time.

Figure 3: Number of hymns per week in the `kjerste2015` dataset

Samuel’s 2015 survey

There must have been something going around in 2014–2015 because, coincidentally, just as I was getting interested in my project and just as Kjerste was too, Samuel Bradshaw, yet another curious LDS musician, wanted to collect some data for a hymn stats project. Samuel had the same idea as Kjerste and wanted to get people to fill out a quick survey every week saying what hymns they sang.

Fortunately, Samuel runs SingPraises.net and appears to have many more connections and resources than me and Kjerste. So, he was much more successful in advertising the project and getting people to submit data. Basically, he did was I was not able to do and actually got people to submit week after week. After a year of data collection, Samuel ended up with data from 214 unique wards and 2,328 sacrament meetings, as seen in Table 1. Samuel ended up publishing the results of his survey on his website.

Figure 4: Number of hymns per week in the `samuel2015` dataset

I got my hands on the samuel2015 dataset early on when Samuel and I agreed to share our data.

Samuel’s 2017 survey

After a successful project in 2015, Samuel Bradshaw revamped the survey and distributed it again in 2017. Again, he has more resources, a wider network of musicians, and his SingPraises.net website to help advertise. He put a link to the survey at the top of every page of SingPraises.net, so everyone who visited the site during that year saw it. He created a Google Groups mailing list for people to join and get updates about the project. People got reminder emails sent to them automatically to remind them to fill out the survey and to spread the word. He asked people with music callings to submit their spreadsheets they used for planning their hymns (like what my joey2015 dataset did). He created Spanish and Portuguese versions of the survey to hopefully reach a wider audience. And he followed up with wards if they were missing a few weeks.

Through these efforts, Samuel collected an enormous amount of data: 11,068 sacrament meetings from 692 wards from around the world! He averaged 112 responses every week through 2017. Plus, for a few wards he was able to get a long history of hymns sung: two wards submitted over 14 years’ worth of data! At this point, Samuel already had access to the kjerste2015, joey2015, and of course samuel2015 datasets, so when he published the results on his website, it was by far the largest hymn stats project to date.

Samuel has again graciously agreed to share his data with me for the purposes of this project. At the time of writing, the samuel2017 collection comprises 55% of the total dataset, so his contribution more than doubled the amount of data I have.

Comparing the data collecton methods

I think it’s important to pause and compare the two approaches to data collection. The joey2015 dataset has 2,042 sacrament meetings from 21 wards. The median number of sacrament meetings per ward that that dataset has is 40, and the average is 97. So, a lot of data from a few wards. The samuel2015 collection is a little larger, 2,328 sacrament meetings, but it comes from 214 wards, which is ten times as many. The average number of weeks submitted per ward is 11, and the median is just two. So while many, many more wards are represented, we only get a snapshot of what that ward is like. In the samuel2017 collection, the numbers were the same: the average number of weeks each ward submitted was 11 but the median was still 2. So, though the attrition rate was about the same as in his 2015 survey, just the sheer volume of data that he collected was astounding. The kjerste2015 collection is similar to Samuel’s and has a lot of data from a few wards and a little bit of data from many wards. One method gets depth and the other gets breadth and I think the two methods complement each other nicely.

Figure 5 and Figure 6 illustrate these differences. For both figures, we have time represented on the x-axis, with older dates on the left and newer dates on the right. Along the y axis, we have each ward, anonymized. Wards are arranged from top to bottom based on the oldest date the submitted data from. Each contribution is a single dot on the plot.

Figure 5: Contributions by date and ward in the `joey2015` dataset.

Figure 5 above is from the joey2015 data. Again, fewer wards, but many contributions from each ward because I was specifically seeking out spreadsheets that music coordinators were already using. The smallest contribution from a ward was a single week’s worth of data from when I was visiting there. This plot spans over a decade because some wards had many years’ worth of data.

Compare this to Figure 6 below. The plot is organized in the exact same way. Again, ten times as many wards. However, you can see that the vast majority of these wards only contributed one or two weeks’ worth of data. Some did more, but even the most dedicated people had gaps in their submissions (even after Samuel’s dutiful efforts of following up with people).

Figure 6: Contributions by date and ward in the `samuel2015` dataset.

These plots only show the joey2015 and the samuel2015 datasets. Since the joey2023 dataset mostly follows the same methods as the joey2015 one, it looks the same, just with more wards. The kjerste2015 dataset looks very similar to the samuel2015. The samuel2017 mostly does too, except it’s so much bigger because there is so much data.

I say that these two methods complement each other because they can be used to answer different questions. The sheer number of contributions from so many wards in Samuel’s and Kjerste’s collections means we can see what happens at a macro level. What hymns are the most common, least common, popular around holidays, etc. And since the bulk of the data came from the same year, we get a really nice snapshot of what the church was doing at that time.

However, having data from many years means I can answer other questions that would not be possible no matter how much data is collected from a single year. For example, I’ve shown that people usually sing Thanksgiving hymns the Sunday after Thanksgiving, unless that day falls on December 1st, in which case we get a surge of Christmas hymns, most notably Joy to the World (#201) and Oh, Come, All Ye Faithful (#202), suggesting that the Christmas season truly starts on December 1st and not just the Sunday after Thanksgiving. Similarly, I’ve shown that Christmas hymns wane the further you get from Christmas, and that New Year’s hymns peak not on New Year’s Day, but on New Year’s Eve. Again, the data I have spans many years, so I have a lot of data from every calendar day of the year.

I was also able to do a pretty cool (but rather technical) analysis (part 1 and part 2) on how many hymns a ward sings and how long it takes for them to level out. It seems like wards sing about 105 unique hymns per year, and average about 3.47 per week. Most wards level out at around 240 hymns, give or take a couple dozen, and it takes about five years to get to that point. This kind of analysis is only possible if you’ve got many wards contributing many years of data.

How much data do I have?

Now that we’ve talked about where this data came from, let’s talk about how much data I actually have. I have pooled the data from all five sources together, cleaned them up, made them compatible with each other, and now have a pretty hefty spreadsheet of hymn stats data, the Frequency data.

How many sacrament meetings?

The Frequency dataset currently has information from 30,593 sacrament meetings. Assuming a rate of one sacrament meeting a year, and 48 meetings a year (52 minus two for ward conference and two for stake conference), it would take a person 637.35 years to experience that many congregational hymns. So, this collection represents far more than what any one person can experience in a lifetime.

How many wards?

This data comes from 2101 unique wards and branches. We’ll se below that most of that comes from the United States. Specifically, 1155 are from the US. At the time of writing, the church reports that there are 14,614 congregations in the United States. That means that this sample has information from 7.90% of the wards in the United States, which is not too shabby. We can certainly thank Samuel Bradshaw for this since only 24% of the all the wards in this sample are from my own data collection efforts.

How many hymns?

The whole spreadsheet has 103,789 rows in it, each representing a congregational hymn sung in some ward or branch sometime in the past 22 years somewhere in the world. Table 2 shows just a snapshot of what the raw dataset looks like under the hood.

Table 2: A sample of the raw dataset

id	ward	type	hymn_num	hymn_name
8787	Bristol 2nd Ward	Opening	6	Redeemer of Israel
13955	Champaign Ward	Closing	218	We Give Thee But Thine Own
17763	Leiden Wijk	Intermediate	228	You Can Make the Pathway Bright
29119	Trailridge Ward	Intermediate	259	Hope of Israel
29910	Scatter Creek Ward	Closing	260	Who’s on the Lord’s Side?
30049	Canandaigua Ward	Sacrament	193	I Stand All Amazed
37526	Пловдив Клон	Intermediate	166	Abide with Me!
48444	American Fork 38th Ward	Closing	113	Our Savior’s Love
65655	Herndon Ward	Sacrament	178	O Lord of Hosts
76232	Sharon 8th Ward	Sacrament	180	Father in Heaven, We Do Believe

As you can see, it’s pretty straightforward data. I have columns for the ward name, what I’m calling the “type” (which is just whether it’s the opening, sacrament, intermediate, or closing hymn), the hymn number and the hymn name. The first column, “id”, is simply a unique identifier for each row in the spreadsheet.

For the most part, there are no partial rows in this spreadsheet. However, a few wards have organized their spreadsheets in such a way that I can recover the dates, but not the “type.” For such wards, I have put an NA in the “type” column and they are excluded from any sort of analysis that uses those types. In a few instances, I’m sent data that has partial information about a sacrament meeting, such as just the opening hymn but not the others. In such cases, I toss the data because I only want to include complete sacrament meetings.

Geographic Distribution

For each ward/branch I have a separate spreadsheet that includes basic metadata about them. Table 3 shows that I have the name (for simplicity, I call it “ward” even though branches are included), and the city, state, and country.

Table 3: A sample of the ward metadata

ward	city	state	country
Maple Meadows 1st Ward	Maple Mountain	Utah	United States
Cedar Hollow 6th Ward	Highland	Utah	United States
Gallatin Ward	Gallatin	Missouri	United States
Palo Alto 2nd Ward	Palo Alto	California	United States
Jordan Ridge YSA Ward	West Jordan	Utah	United States
Yorktown Branch	Yorktown	Texas	United States
Las Vegas Branch	Las Vegas	New Mexico	United States
St. George 7th Ward	St. George	Utah	United States
Grover’s Hill Ward	St. John’s	Arizona	United States
Muroc Ward	North Edwards	California	United States

Countries

I currently have data from 60 countries, however, 87.82% of the data comes from the United States. A fair amount also comes from Canada, the United Kingdom, and Australia. Table 4 (click to expand) shows the full breakdown of how much data I have from each country, in terms of how many hymns, how many sacrament meetings, and how many wards. So, the US is over-represented in this dataset, compared to how members of the church are distributed around the world. This makes sense given that recruitment efforts were based in the United States and were in English. Therefore, while I can’t say that the patterns here do not represent wards outside of the US, any extrapolation of these patterns to other countries should be taken with a grain of salt.

Expand to see the breakdown by country

Table 4: Countries in the dataset

country	hymns	meetings	wards	percent of full dataset
United States	85500	25229	1155	87.82%
Canada	3723	1049	35	3.82%
United Kingdom	1828	501	14	1.88%
Netherlands	916	260	9	0.94%
France	865	245	9	0.89%
Australia	436	118	8	0.45%
Denmark	413	144	2	0.42%
Northern Mariana Islands	354	101	1	0.36%
Mexico	318	89	6	0.33%
Taiwan	254	75	3	0.26%
Spain	251	77	4	0.26%
England	248	71	5	0.25%
Brazil	201	57	4	0.21%
Argentina	180	63	4	0.18%
Democratic Republic of the Congo	164	53	3	0.17%
New Zealand	164	45	5	0.17%
Germany	156	47	5	0.16%
Chile	146	46	4	0.15%
Ghana	145	45	6	0.15%
Philippines	96	29	4	0.10%
Japan	92	27	2	0.09%
Indonesia	79	26	7	0.08%
Finland	74	21	4	0.08%
United Arab Emirates	69	19	2	0.07%
Liberia	68	21	1	0.07%
Trinidad and Tobago	68	19	8	0.07%
Uganda	64	19	1	0.07%
Italy	52	15	4	0.05%
Samoa	46	14	3	0.05%
Sweden	46	16	2	0.05%
Bulgaria	39	11	2	0.04%
Ecuador	37	10	2	0.04%
Colombia	34	9	2	0.03%
Guyana	29	8	6	0.03%
Paraguay	29	8	1	0.03%
Israel	25	6	1	0.03%
Vanuatu	19	6	1	0.02%
Peru	18	6	3	0.02%
French Polynesia	16	5	2	0.02%
Poland	12	4	1	0.01%
China	11	3	1	0.01%
New Caledonia	11	4	2	0.01%
Uruguay	8	4	1	0.01%
Belgium	7	2	1	0.01%
Puerto Rico	7	2	2	0.01%
Zimbabwe	6	2	1	0.01%
India	4	1	1	0.00%
Suriname	4	1	1	0.00%
Aruba	3	1	1	0.00%
Dominican Republic	3	1	1	0.00%
Guam	3	2	1	0.00%
Guatemala	3	1	1	0.00%
Nigeria	3	1	1	0.00%
Panama	3	1	1	0.00%
South Africa	3	1	1	0.00%
American Samoa	2	1	1	0.00%
Dominica	2	1	1	0.00%
Switzerland	1	1	1	0.00%
Thailand	1	1	1	0.00%
kathyUnited States	1	1	1	0.00%

States and Provinces

Within the United States, I have data from 47 states and the District of Columbia, as seen in Table 5. The only states I don’t have data from are Arkansas, Delaware, and North Dakota. Unsurprisingly, the bulk of the data (43.60%) comes from Utah. Other western states are well-represented, but so are Texas, North Carolina, and Georgia. Georgia is probably so high on this list because I lived there and contributed many years’ worth of data from my own wards. Again, this is not representative of the distribution of church members in the US. Especially for the states that have less data, the influence of a large contribution by a single ward can have an overwhelming influence on the overall results for that state. For that reason, I do very little geographic analysis in this blog.

Expand to see the breakdown by state

Table 5: US states in the dataset

state	hymns	meetings	wards	percent of US data
Utah	37190	11132	471	43.59%
Idaho	5136	1540	103	6.02%
Colorado	4349	1277	26	5.10%
California	3477	1010	77	4.08%
North Carolina	3375	1037	14	3.96%
Texas	3278	981	52	3.84%
Arizona	2646	810	55	3.10%
Washington	2407	746	41	2.82%
Georgia	2386	691	19	2.80%
Virginia	2140	677	14	2.51%
Maryland	1658	494	11	1.94%
Florida	1427	433	22	1.67%
Wisconsin	1426	417	10	1.67%
Ohio	1415	419	13	1.66%
Iowa	1262	374	9	1.48%
Nevada	1203	361	16	1.41%
Missouri	1063	328	17	1.25%
Illinois	1027	312	16	1.20%
New Mexico	993	322	7	1.16%
Oregon	967	296	21	1.13%
Michigan	836	250	9	0.98%
Tennessee	659	199	19	0.77%
Wyoming	652	191	13	0.76%
New York	527	155	16	0.62%
Maine	419	121	4	0.49%
Pennsylvania	412	123	9	0.48%
Louisiana	407	113	3	0.48%
Indiana	394	110	9	0.46%
Hawaii	378	109	11	0.44%
Connecticut	263	77	5	0.31%
South Carolina	223	69	7	0.26%
Oklahoma	185	55	4	0.22%
Kentucky	159	44	5	0.19%
Nebraska	136	40	2	0.16%
Minnesota	133	43	4	0.16%
New Hampshire	121	33	1	0.14%
Alabama	106	31	3	0.12%
Kansas	105	35	5	0.12%
New Jersey	61	18	3	0.07%
Massachusetts	58	17	2	0.07%
Montana	57	18	6	0.07%
North Dakota	56	17	2	0.07%
South Dakota	55	16	3	0.06%
Alaska	47	17	2	0.06%
District of Columbia	12	4	3	0.01%
Mississippi	10	3	1	0.01%
Rhode Island	8	2	2	0.01%
Vermont	6	2	1	0.01%
West Virginia	3	1	1	0.00%

Since Canada is the second largest country in this dataset, it’s worth a look to see how that data is broken down (Table 6). Only five provinces are represented in this sample, and they definitely don’t reflect the distribution of members there. About half the Canadian data comes from a single contribution of over eight years of data from a ward in New Brunswick. So even though there are many more members in Alberta, that one ward overwhelms the rest of the Canadian sample.

Expand to see the breakdown by Canadian province

Table 6: Canadian provinces in the dataset

state	hymns	meetings	wards	percent of Canadian data
New Brunswick	1802	493	2	48.40%
Alberta	1697	495	26	45.58%
Quebec	90	23	2	2.42%
Saskatchewan	71	20	2	1.91%
Nova Scotia	59	17	2	1.58%
Prince Edward Island	4	1	1	0.11%

Cities in Utah

Within Utah, we can even break it down by city. Table 7 shows that Provo has the largest representation. A big chunk of these come from YSA and Married Student wards. This again makes sense since much of the recruitment efforts took place in Provo. Spanish Fork is high on the list because that’s where I currently live.

Expand to see the breakdown by city in Utah

Table 7: Utah cities in the dataset

city	hymns	meetings	wards	percent of Utah data
Provo	6750	2051	70	18.77%
Spanish Fork	2307	705	28	6.42%
Salt Lake City	2215	680	25	6.16%
American Fork	2099	667	6	5.84%
Orem	1770	550	14	4.92%
Draper	1423	468	7	3.96%
Lehi	1321	410	13	3.67%
Saratoga Springs	1231	367	4	3.42%
West Jordan	1162	362	11	3.23%
Taylorsville	794	237	2	2.21%
Hyrum	710	237	1	1.97%
Hurricane	690	227	7	1.92%
West Valley City	675	198	8	1.88%
Payson	638	197	9	1.77%
Bountiful	610	192	13	1.70%
Springville	589	175	19	1.64%
Roy	509	150	5	1.42%
Sandy	507	155	14	1.41%
Mount Pleasant	502	143	5	1.40%
Highland	495	152	12	1.38%
Cedar City	419	126	5	1.17%
St. George	407	126	15	1.13%
South Jordan	388	130	7	1.08%
Riverton	339	108	5	0.94%
Washington Terrace	337	98	1	0.94%
Layton	325	95	7	0.90%
Logan	318	93	8	0.88%
Herriman	303	91	3	0.84%
Pleasant Grove	294	92	9	0.82%
Centerville	266	79	5	0.74%
Farr West	255	93	1	0.71%
Holladay	240	74	4	0.67%
North Logan	234	64	3	0.65%
Kamas	232	70	1	0.65%
Huntsville	223	71	1	0.62%
Coalville	221	71	2	0.61%
Sunset	217	65	2	0.60%
Providence	203	68	1	0.56%
Eagle Mountain	199	59	4	0.55%
West Haven	193	61	3	0.54%
Bluffdale	175	54	2	0.49%
Ogden	168	49	5	0.47%
Parowan	166	52	3	0.46%
Washington	157	48	2	0.44%
Moab	154	44	3	0.43%
Harrisville	153	49	2	0.43%
Murray	144	43	5	0.40%
Heber City	141	47	1	0.39%
Manti	136	44	5	0.38%
West Valley	128	38	2	0.36%
Morgan	116	32	5	0.32%
Salem	106	33	3	0.29%
Farmington	102	33	3	0.28%
Indianola	98	29	1	0.27%
Woodland Hills	98	29	2	0.27%
Kanab	95	27	2	0.26%
Woods Cross	90	27	3	0.25%
Oakley	89	30	1	0.25%
Erda	83	25	1	0.23%
Monticello	80	25	1	0.22%
Jordan River	79	23	1	0.22%
Tremonton	70	22	2	0.19%
Elk Ridge	63	19	1	0.18%
Unknown	60	19	1	0.17%
Smithfield	54	17	4	0.15%
Syracuse	51	15	2	0.14%
Mapleton	50	16	3	0.14%
Woodruff	44	12	1	0.12%
Alpine	42	13	4	0.12%
Clearfield	38	11	2	0.11%
Richmond	35	11	1	0.10%
Eden	32	10	2	0.09%
Midvale	32	6	1	0.09%
Santaquin	29	9	2	0.08%
Kaysville	22	7	1	0.06%
Brigham City	14	4	2	0.04%
Fruit Heights	14	4	1	0.04%
Clinton	13	4	3	0.04%
West Point	12	4	4	0.03%
Fountain Green	9	3	3	0.03%
Grantsville	9	3	1	0.03%
Hoytsville	9	3	1	0.03%
Saint George	9	3	1	0.03%
Sun City	9	3	1	0.03%
Plain City	7	2	2	0.02%
North Salt Lake	6	2	1	0.02%
West Bountiful	6	2	1	0.02%
Hyde Park	4	1	1	0.01%
Lindon	4	1	1	0.01%
Maple Mountain	4	1	1	0.01%
Paradise	4	1	1	0.01%
Stansbury Park	4	1	1	0.01%
Wallsburg	4	1	1	0.01%
Alton	3	1	1	0.01%
Bennion	3	1	1	0.01%
Elwood	3	1	1	0.01%
Enoch	3	1	1	0.01%
Ephraim	3	1	1	0.01%
Kearns	3	1	1	0.01%
Pleasant View	3	1	1	0.01%
Riverdale	3	1	1	0.01%
Sterling	3	1	1	0.01%
Tooele	3	1	1	0.01%
Marion	1	1	1	0.00%

So, at all levels of geography, there are sampling errors, many of them biased towards where the recruitment has happened. There’s not much I can do about that, but it’s still useful to see where the data comes from.

Distribution across time

The oldest datapoint in this sample comes from April 28, 2002 and the newest is currently from May 25, 2025 (some wards’ spreadsheets have future meetings planned already). So, the data spans over twenty years. Considering I started data collection in 2013, it’s amazing that I have data from 2002 and 2003 at all—amazing that someone held on to those records for so long, that they happened to hear about the project, and that they were willing to contribute it all! If we ignore a few gaps in 2002 and 2003 and the second quarter of 2020 when in-person sacrament meetings were suspended, I have a nearly unbroken line of data spanning two decades.

Of course, like the geographic distribution, the temporal distribution of this dataset, is not even. Figure 7 shows that there is a major spike in data in the mid-2010s. The project started in 2013, so anything from before then is from old spreadsheets that people sent us, and unsurprisingly, the further back you go the less data I have. In 2013 and 2014 it picks up as I began my recruitment efforts. Samuel’s first survey explains the increase in 2015. And his 2017 survey, which began in the second half of 2016, explains the monstrous spike that year. Samuel stopped collecting data in early 2018, and I only recently started getting more in late 2023, so anything from 2018 on is mostly from spreadsheets collected in 2023. Data from 2024 is from my increased efforts around the time of the new batch of hymns coming out.

So, like the geographic data, it’s a wide sample, but very highly biased towards when and where data collection was happening. For that reason, I am hesitant to make any claims about changes over time. Especially because the early years are only represented by a few wards.

Conclusion

On this page, I’ve explained where the data came from and how much of it I have. It is a conglomeration of five different projects by three people, all of whom started to get interested in hymn stats around 2014 completely independently of each other. Samuel Bradshaw and Kjerste Christensen’s surveys got data from many wards, while mine got lots of data from a smaller number of wards. There are over 400 years’ worth of sacrament meetings represented in this sample. It mostly comes from the United States, with the bulk of that being from Utah, but many states and countries are represented in this sample.

Hopefully that answers any questions you might have about the data used in this blog. If you would like to contribute your own data, you may certainly do so here!