The methodology for the Forenames of Ireland 1911 and the Surnames of Ireland 1911.
The aim of this section is to describe the methodology used to formulate the statistics presented in the Forenames of Ireland 1911 series and the Surnames of Ireland 1911 series. Both series use the same experimental dataset which originates from transcriptions of the 1911 census of Ireland, currently made available by the National Archives of Ireland. Using these transcriptions, for each question asked, the aim was to try to group these transcriptions into categories. In 1911, the questions asked which have been transcribed were as follows:
Unfortunately, in a document such as a census, there are errors in what could be written, such as writing “English” in the Irish Language section, when it was stated not to. Furthermore, there are questions such as Occupation and Birthplace where multiple answers could mean the same thing. For example, if a person in the census claimed to be from “Westmeath” and another from “Co. Westmeath”, or even shorthand such as “W Meath”, that these would be grouped into a single item, “Westmeath”. As a result, where possible, the dataset generated for the two series tries to group census entries into items such as religion and birthplace. Furthermore, the transcriptions are not perfect, there are questions which were not possible to compile data for due to not all the information being transcribed. For example, not all disabilities were transcribed, thus it would be quite difficult to attain statistics without looking through every census document. The following sections go through each of the questions asked above and the statistics obtained from the data.
The forenames, or first names have had slight corrections. Firstly, all non-letters have been removed (such as full-stops, apostrophes) and grouped together separately for males and females. Then, to avoid mistakes with spelling and transcription, only first names which matched the CSO first name database created from 1964 to the present day. The names in this database contains names where more than three were registered each year, for both males and females. It is estimated, the majority of actual names from the 1911 census would be contained in this database.
In some instances, multiple names were given for a person, such as the inclusion of middle names like “Michael Patrick” for example. While it is uncertain whether this would be just a multiple named first name like “Mary-Anne”, the decision was made to remove any name but the first one unless the entire name is contained on the CSO database. This means names such as Mary Rose existed in the database as Maryrose, and those the second name Rose was maintained.
For the Forenames of Ireland 1911 series, each of the detailed statistics below is carried out for all first names with a population greater than 1,000 people. For some forenames less than 1,000 people, due to the lower population size, only the proportions of populations by electoral divisions and district are provided, along with religious breakdown and birthplace. For the Surnames of Ireland 1911 series, a table of the top first names is compiled for each surname on the list. A very important note is that the order of the most popular forenames is highly dependent on the methodology described here. As a result, it may be different from other lists which have used different approaches/methodologies.
Like the forenames, the surnames have also had some slight corrections to help group surnames which are closely linked such as those with an O prefix and those without. Firstly, as with the first names, all non-letters have been removed. Then, any starting with Mac, Mc or O, these prefixes have been removed e.g., O’Brien and Brien are considered the same name, thus the O is removed from the front and are both considered Brien. Given the number of different surnames and variations, no further changes are made to surnames other than small character changes that were noticed.
For the Forenames of Ireland 1911 series, the top surnames which correspond to each name are shown. For the Surnames of Ireland 1911 series, detailed statistics are provided for those last names with population greater than a 1,000. Some surnames with population under 1,000 will have statistics on the proportion of the population by DED/district, religious breakdown, and birthplace. A very important note is that the order of the most popular surnames is highly dependent on the methodology described here. As a result, it may be different from other lists which have used different approaches/methodologies.
For both series, a proportion of the population with a particular first name or last name is shown by district (also known as poor law unions), and electoral districts (DEDs), which are one the smallest statistical units, other than townlands, which are still in use today. Quite simply, a first name/ surname is grouped by either each district or DED and a proportion is obtained where the total is the remainder of the population in the district/DED. Note for the first names, the proportion is taken from the total number of males or females in each district/DED.
The maps shown for both district and DED use what is called a logarithmic norm when displaying the proportion. This only difference to a normal percentage display is that a logarithmic norm shows the contrast better between areas and not just where names are most prominent. There is a colour bar displayed next to each map which use scientific notation to display the proportion. For interest, the following should be known when interpreting the notation for the graphs:
As mentioned in the introduction, there were four separate parts to the marital status section of the census. The specific question on what marital status each had should have been filled out for every member of the household. These options were “Married”, “Single”, “Widow” and “Widower”. When filling out the census, different words were used to describe the marital situation of each person. For example, single people were also written in as “Not Married”, “Spinster” or “Bachelor”. Furthermore, in some cases, widow and widower were used interchangeably, hence, when categorising, if someone was a widow or widower, they were grouped into a single group. The categories which each entry was assigned to were (if possible):
Only the population greater than 15 years old were considered for the number of people with a certain name and their marital statuses. The visualisation chosen was a pie chart to illustrate this.
The other questions, which were supposed to be in the entry for the married woman, did not always happen. In some cases, both the husband and wife were both filled in or just the husbands. Thus, in terms of the statistics carried out here, it would have been difficult to fix this issue and as a result, these are not presented in each series.
For the religious breakdown, a similar task as Marital Status was carried out by grouping each census entry into a religion. For example, those who were Catholic were also referred to as “RC”, “Roman Catholic” or “Cath”, to name but a few. For each of the grouped religions, a set of matching words was created to solve the problem. The religions which were grouped were:
The proportion is calculated for each name and what was the religious breakdown for it. These proportions are rounded to the nearest whole number (e.g. 45.2% would be rounded to 45%). Any religion under 3% is grouped into an “Other” category.
The aim here was to get statistics from the occupations for each member of the household excluding scholars. Due to the large variety of occupations, it was decided to use the reported occupation rather than try to group, due to the variety of different names of occupations and the occupations themselves. As a result, for each forename/surname, a table is provided with the top 5 occupations.
The aim here was to group each entry on the census form for birthplace into counties where each person was reported to have been born in. The data was mapped as a proportion of each forename/surname that were born in each county over the total population in it (for forenames, also for each gender). Note that there are some differences to the historic counties from 1911:
Similar to the proportions in terms of DEDs and districts, logarithmic norm was decided as the best approach to display the data. It provides a greater contrast in areas with a lower proportion which allows us to see different regional differences. Furthermore, any births outside the island of Ireland are not displayed in the map, but included in the total in which the proportions were generated from.
As described in the introduction, there was a question on the literacy status of each member of the household. For this statistic, the aim was to group the returns into either:
This is carried out by matching common words to describe each category. A proportion is calculated, removing unknowns for each of the three categories for people aged 9 and above.
The last statistic presented for each forename/surname is the proportion of people who claimed to speak Irish. As mentioned in the introduction, people were to fill out “Irish” if they only spoke Irish, “Irish and English”, if they spoke both or nothing otherwise. Due to the way the form was filled out, it was decided to get a proportion of Irish speakers, whether they spoke English or not. Furthermore, as people were asked not to fill in this section if they did not speak Irish, the proportion of Irish speakers is out of the total with that forename/surname. Thus, the two categories shown are: