The de facto standardisation of Scots
In which I present new data to support a controversial view
I disagree with the common assertion that there is no standard / generic / common / agreed written form of Scots.
It is asserted so often, that Scots isn’t a standardised language, that we might be tempted to accept it as a true fact. But after first compiling a corpus of 3,000,000 tokens from around 600 contemporary Scots writers, and subsequently a frequency dictionary which collects together spelling variants of words based on their usage frequencies, it seems there is a surprising amount of agreement among writers regarding spelling.
Suppose, hypothetically, we were learning English for the first time, as a second or third language, and were told by our trusted teacher that “its not a standardised language”. For example there’s no standard spelling of COUGH, some places spell it COUGH, some spell it, COFF, in some places KOPH, each town has its own way of spelling the word.
Here is an example, actually printed and used in marketing.
Having not read very much English writing, we might be a little intimidated, we’ve mostly seen COUGH, occasionally COFF, but not the other spellings, and who knows how many other local spellings there are out there.
It would only be years later after we’ve mastered the language and read lots and lots of literature that we could disagree with our teacher, and state with confidence that COUGH is the standard spelling and the other suggested spellings are just fluff.
I would argue that with Scots we are in a similar position, a standard does exist but the quantitative data to make a judgement is lacking.
In the Luath Scots Learner Lindsay Colin Wilson discusses different spelling schemes and states:-
“Note that hoose, hous and hus would all be pronounced identically within the contexts of their respective spelling systems. Spelling systems of the ‘radical’ kind are not widely taken seriously and, if a generally accepted way of spelling Scots emerges, it will probably involve a combination of elements from ‘modified English’ and ‘traditional’ spelling. It is unlikely that there will be a generally accepted spelling for Scots without some form of official support for it but, so far, this has not been forthcoming.”
I would like to argue that this “generally accepted spelling” for Scots, does exist and has already emerged.
Corpus and frequencies
For the past three years I have exhaustively sought out Scots writing from the 21st century, from as many writers as I could find, anything that looks a bit Scots-ish I have harvested and compiled into a corpus, from which statistical information can be read. The corpus contains more than three million tokens, around 127,000 unique words / spellings written by around 600 writers from all across Scotland.
For comparison, I estimate that a total of about eleven million words of Scots writing have been published in the 21st century in literature, news media and social media, so the corpus represents a sample of more than a quarter of the total.
The 2011 Scottish census reported there were 1,225,622 people who could write Scots, this corpus is a representation of what that writing looks like.
The corpus isn’t very well balanced between different writers contributions. Due to the small number of published writers, the top ten most prolific writers make up about 25% of the total mass of words. As a low resource language, this is the best I can do to balance quantity and quality.
The academic register of contemporary Scots writing has around only three writers, there is no detectable legal register of contemporary Scots writing, and no instructional register, you don’t get a Scots translation in the instructions for new mobile phones.
Poetry, memoire and childrens literature are present in the corpus, these forms make up most of the extant contemporary Scots writing.
A frequency dictionary is being compiled from the corpus, grouping together spelling variants for each word and then summing and ranking their frequencies. The top 1,000 most common word groups are listed in a draft pdf document here (https://chrisgilmour.co.uk/shop/freqdict-latest.pdf) about 3,000 spelling variants are included.
If we take up Mr Wilson’s suggestion and look at the various spelling variants of HOOSE:-
hoose - 2,660 occurrences (276 writers)
house - 132 occurrences (60 writers)
hous - 117 occurrences (13 writers)
hoos - 21 occurrences (12 writers)
hus - 1 occurrences (1 writer)
We can see that the vast majority of people using that word chose the same generally accepted spelling - HOOSE, perhaps 76% of writers. Other spelling variants exist, but they are not as popular by a significant margin. If we had to select one spelling as being “generally accepted”, there is a clear winner.
(Some writers might occasionally use both spellings at different points in their writing career, its difficult to control for this)
Even if some official government source started using HOUS, that would still only be one more writer. In order for HOUS to become more popular than HOOSE, we would need to find more twenty times as many new or undiscovered writers using it. This would be more of a multi-generational spelling change, than something controlled and resolved overnight.
Arguably if a popular brand used HOUS in their ubiquitous advertising to fix it in people’s minds, that might work. In a similar way to McDonalds popularising the stative verb case with “I’m loving it” where a previous generation would merely say “I love it”. This sort of thing only works influencing a few individual words, as they are adopted into the language first as cliché advertising memes and then as part of people’s general vocabulary.
(in theory, HOOSE could itself be a spelling variant of HOSE (as in hosepipe) and also act in a verbal form TO HOSE SOMETHING DOWN. There are no instances of this usage in the Scots writing in the corpus, but in context it wouldn’t be remarkable if we did see this spelling / usage. In this case these instances would be included as variants in the entry for HOSE and not the entry for HOOSE)
If we look at further examples of words and the popularity of different spelling variants, here's a page taken at random from my frequency dictionary, listing the 133rd most common word through to the 141st most common. Where it lists how many occurrences each spelling variant has in my corpus.
We can look at each word in turn to see:-
THON - only one spelling, virtually no one uses any alternative spellings
ROON - 1,522 occurrences of the most common spelling, 941 occurrences of the second most common - 59% for the winner
GEY - 1,296 occurrences of the most common, 666 of the second most - 50% for the winner
THING - 1,977 for the most common, 572 for the second most - 77% winner
EVEN - 2519 for the most common, 25 for second place - 99% winner
SYNE - 1856 for the most common, 540 for the second - 73% winner
STILL - 2472 for the winner, 37 for the second - 99% winner
ONLY - 1813 for the most common, 307 for the second - 74% winner
GO - 1936 for the most common, 495 for the second - 80% winner
There is majority agreement for the the most popular spelling form for all these nine words.
If we go through the top hundred most common words in Scots writing, we will find the same sort of general agreement, around 81% of writers pick the most common spelling for more than half of the top 100 words.
We might classify each word where there is “General Agreement” on the spelling. Other spellings exist but these are not used by a majority of writers.
Furthermore, some words only have one spelling variant listed, there may be other spellings but these are used by such a small minority that they couldn’t be candidates for general agreement. These words we could call “Unanimous Agreement”.
These “Unanimous Agreement” words make up about 24% of the total, and include literally the most common words AT, UP, MUCKLE, TRYIN, STORY.
Whilst there are some words with no majority agreement on spelling, with regional preferences and non-regional variation, these words make up only thirteen of the top 100 words.
Minority spellings
Naturally there is always a plurality of of writers using the most common term, the numbers are often so close to the second and third most popular spellings that some degree of thought must be given to selecting which one to use.
An example of this is SCHOOL.
A total of 1,448 occurrences of all spelling variants in the corpus, with 629 occurrences of SCHOOL (43%), 498 occurrences of SCHUIL (34%), 94 SCHULE (6%) and 86 SKWEEL (6%).
At this point I would like to wheel out eminent English linguist Professor David Crystal. In his 2017 book “Making Sense: The Glamorous Story of English Grammar”, he has some views about how language variations should be taught.
He recommends that standard supra-regional forms and regional nonstandard forms should both be taught, should both be respected, and contrasts between the two are studied.
He has a concept where different the forms are stylistic choices, that allow the writer to assert subtext, for example allowing regional or cultural identities to be expressed. This is in addition the idea of different registers, so there might be a regional formal style of writing or regional styles of academic or legal writing.
In the context of SCHOOL, we might see the popular spelling as being the vanilla supra-regional form that everyone will recognise and understand. The SCHUIL spelling is asserting that “this is distinctly written in Scots”, and the SKWEEL" spelling further asserts “this is written in my regional variety”.
We might consider how the Great Vowel Shift has affected different languages and regional variation.
We can actually observe the side-by-side co-existence of supra-regional and local non-standard spellings in SCHOOL.
SWKEEL is clearly a distinctly Doric / North Eastern spelling of the word, almost nowhere else uses it.
The SCHOOL spelling is used in a all regions, and has slightly more writers using it in the North Eastern region than the SKWEEL spelling.
SCHUIL is also widely used, but more often in the Central and Ulster-Scots regions.
Not all spellings neatly fall into the supra-regional / non-standard paradigm. If we consider DINNAE
The two most popular spellings DINNAE and DINNA, representing 27% and 26% of all spellings, it is harder to select a supra-regional form. DINNAE is characteristically a Central and Ulster Scots spelling, DINNA is a distinctly North Eastern spelling.
These words with only where the most popular form can only muster less than 50% of writers and occurrences we might classify them as “Minority Agreement” words, and its this sort of thing upon which a Scots Language Board would want to assert rulings.
Scaling up
The relative proportions of each class of words seems somewhat consistent throughout the entire lexicon.
Out of the Top 1,000 words the average preference for the most popular spelling is 79.4% with a median of 83.6%.
12.9% are “Minority Agreement”
23.8% are “Unanimous Agreement”
In the Top 100 words the average preference for the most popular spelling is 76.9%, with a median of 81.6%.
14% are “Minority Agreement”
9% are “Unanimous Agreement”
In the Top 500 words the average preference for the most popular spelling is 78.6%, median 83.1%
13.3% are “Minority Agreement”
21% are “Unanimous Agreement”
In the 501 to 1,000 bracket, the average preference for the most popular spelling is 82.1% with a median of 91.6%
11% are “Minority Agreement”
31% are “Unanimous Agreement”
About half of the top preference words are shared with the English language, I take an subjectively ambivalent view of this matter, but its important for some people’s conception of the Scots language - we can divide the entire lexicon into English words and Scots words
For distinctly Scots words, the average preference for the most popular spelling is 73.5%, with a median of 77%.
18% are “Minority Agreement”
13% are “Unanimous Agreement”
For words shared with standard English, average preference for the most popular spelling is 85% with a median of 88%
10% are “Minority Agreement”
37% are “Unanimous Agreement”
Unlocking Scots Standards
In Clive Young’s “Unlocking Scots: The Secret Life of the Scots Language” (2023), the issue of standardisation and “no standard” is discussed quite early on.
1915 - Sir James Wilson - “The Lowland dialects are not at all sharply divided from each other”, “the use of a more or less standard form for literary purposes”
2019 - Open University’s Scots Language and Culture course - “As opposed to English and Gaelic, Scots is a non-standard language”
2021 - Scots Language Centre’s Scots Warks - “Scots has no standardisation”
2017 - James Costa - “While there is no Scots standard de jure, numerous debates have come to shape sets of expectations, if not of norms, as to what Scots should de facto look like.”
2023 - Clive Young - “after hundreds of years of printing Scots texts a soft standard has emerged.”
In February I attended the Future of Scots Symposium and tried to make a conscious effort not to participate in the standardisation discussion, I’m just a useful but annoying English fan-boy. My view at the time was that if there was a Scots Language Board that formalises the standard, based on whatever terms and guidelines they decide are appropriate, then my corpus might be useful for them to reach decisions, but other than that, its not for an Englishman to be involved with.
It wasn’t until five months later in early July that I read the words “frequency dictionary” and started work on crunching the numbers on different spelling variants. And it wasn’t until about two weeks ago, late September, that I had enough data to summarise that there already is a “generally agreed” de facto standard of spellings.
Its only now that I feel I have the data to express views. A standard has emerged, the Wild West of spelling variations is an illusion based on personal experience rather than by quantitative data.
Conclusion
In summary, rather than there being no standard form of Scots and each place having its own distinct variety, there is clearly a generally agreed supra-regional form which covers 87% of all words - more than 50% of writers use the same spelling. With around one in ten words where there is general disagreement and no agreed standard form.
Unanimous Agreement - 24%
General Agreement - 63%
Minority Agreement - 13%
Whereas some commentator suggest that a regional variety of the language should be selected and adopted as a new standard. This is a common approach for the spoken aspect of languages, but in the context of a standard written language and the data presented here, it seems un-necessary.
The existence of a generally agreed standard that is used by 84% of Scots writers, suggests the next question for language planners - Should non-standard writers be compelled to adhere to the standard?