Networks in Ulster-Scots orthography
In which I use my corpus of Ulster-Scots writing to analyse different orthographies in Ulster-Scots writing
After about five years of collating texts written in various regional dialects of Scots and Ulster-Scots, my corpus of 21st Century Scots Texts contains the work of about 38 Ulster-Scots writers and eight people who posted on Twitter. Its not a huge sample size and there are several named authors who are missing because I haven’t pursued obtaining copies of their work. Also we should note that some of the writers listed are “named authors” and some are nameless bureaucrats who write, but don’t attribute their names to their work.
There is some overlap between the writers list and the Twitter users list.
The corpus website contains a utility that compares the writing of different authors and different dialects. It works by compiling a word frequency list, usually the top 200 words, for each person / dialect, and then displays a diagram showing how the two lists overlap.
For the purpose of identifying writer’s regional dialect its pretty neat. Instead of saying “This person writes in the Glasgow dialect”, it will show that a writer is “most similar to Glasgow dialect, and less similar to other regions”.
Generally the most common words in writing are function words, like THE, TAE, IN, HE, SHE, rather than subject related words like BAGGAGE or MANUFACTURED. This means that it is often satisfactory to compare writers who might be writing about completely different subjects.
I’m going to compare the listed Ulster-Scots writers with each other, this will lead to more than 1,000 pairs, and will take ages. But we are assuming that Ulster-Scots linguistics is important, and valuable, and whilst I’m no being paid for this, there are organisations out there with a great deal of political and financial resources who could be paying for this sort of research, but choose not to.
Comparison pairs
We will start comparing pairs of writers, starting with Angeline King, her texts contained in the corpus are mostly from her 2022 novel “Dusty Bluebells”.
Another prolific Ulster-Scots writer in the corpus is Anne Morrison-Smyth, with her 2013 translation of Lewis Carroll’s “Alice's Carrants in Wunnerlan” published by EverType.
The dialect comparison utility compiles lists of the 200 most common words from each writer and displays how these lists overlap (feel free to play around with the utility yourself).
Here we see that the two lists overlap by 42.0%. The two writers use different spellings for SHAE/SHE, WUS/WIS, YE/YAE, SED/SAED and so on.
We can adjust the settings so that the top 50 words are compared, or even the top 500 words. These overlap by 50.0% and 37.0% respectively.
The next most prolific Ulster-Scots writer in the corpus is Ian Crozier, the CEO of the Ulster-Scots Agency. Although he admits to not speaking Ulster-Scots, various reports from the Northern Ireland Executive are published under his authorship. The corpus contains a document called “Recomendations fer an Ulster-Scots plen fer oor leid, heirskeip an’ culture” published in 2022 - at this point I can’t seem to find it on the communities-ni.gov.uk website, I swear it was there three years ago.
Angeline King and Ian Crozier’s orthographies overlap by only 16% when we look at the top 200 words.
When we look at the top 50 and top 500 words the overlaps are 18% and 14.4%.
If we complete the triangle and compare Anne Morrison-Smyth and Ian Crozier, the 200 word overlap is 19.0%.
From looking at the writing of these three people, we can see that Angeline King and Anne Morrison-Smyth have a more similar orthography than Ian Crozier does.
Moving down the list of Ulster-Scots writers, we come to Stephen Dornan, represented in the corpus largely by his “The Jaa Banes” poetry collection, published in 2020.
His top 200 words overlap with other writers as follows:-
Angeline King - 41.5%
Anne Morrison-Smyth - 30.5%
Ian Crozier - 17.5%
Again we can say that Stephen Dornan’s Ulster-Scots orthography is more similar to Angeline King and Anne Morrison-Smyth’s orthography than it is to Ian Crozier’s.
We can next consider Philip Robinson, represented in the corpus by his “Oul Licht” poetry collection, published in 2017. We should also note that within the the Ulster-Scots writing community he has been described as follows:-
Philip Robinson is one of the most important writers in the contemporary Ulster-Scots language movement ... His work is immersed in the Ulster-Scots literary and cultural traditions
There is no question as to whether his personal orthography represented authentic Ulster-Scots.
Using our comparison engine the top 200 words overlap as follows:-
Angeline King - 42.5%
Anne Morrison-Smyth - 40.5%
Ian Crozier - 20.5%
Stephen Dornan - 42.0%
Here we see that Ian Crozier’s orthography remains as an outlier. The other writers we’ve looked at are around 40% similar to each other.
Every combination
If we compare every combination of 18 of the most prolific Ulster-Scots writers in the corpus, we get this next matrix where you can read off each value where each writer’s columns and rows meet.
Does that make sense?
The cell shading makes it easy to identify group with similar orthographies.
In the top right we see that Carál Ní Chuilín and the North South Ministerial Body are most most similar to each other.
Carál Ní Chuilín is represented in the corpus with the document “Roadin furtae Brïng Forrits an Graith tha ULSTÈR-SCOTCH Leid, Heirskip an Cultùr 20an15 – 20an35”, this is a translation of a document written in English, we might assume that a translation service has been used rather than this being original writing from the Minister of Culture, Arts and Leisure.
Additionally there’s the “Noarth/Sooth Cooncil o Männystèrs” report, which bears a striking resemblance to Carál Ní Chuilín’s orthography - a 46% overlap. It might even be the case that the same nameless bureaucrat has worked on both translation documents
Below that on the chart is a block of Ian Crozier, HRConnect and Belfast City Council, who arguably share an orthography.
The “HRConnect” column represents the Ulster-Scots Commissioner Candidate Information booklet, created for the current competition to select a commissioner. The row of red cells with a single green cell aligned with Ian Crozier, suggests that whoever write Ian Crozier’s Ulster-Scots writing also translated this booklet.
Furthermore there’s Belfast Council’s “Leid Strategy - Draft Ection Plen” although this is a bit more distinct than the other two writers in this block.
All the other writers hang in a single green block of similar orthography, we might consider them to be “grassroots writers”, who freely choose to write in Ulster-Scots, either as poets or novelists, or academics who study the Ulster-Scots writing.
We might note Fiona McDonald and Philip Robinson are the pair of writers who’s orthography is most similar to each other at 55.5%. It is also possible that the actual person behind the Ulster-Scots Language Society written output is either Fiona or Philip. Perhaps they share an office and compare spellings.
We could argue that along with Roy Ferguson, these writers make up another cohesive orthographic block.
We might also consider that while we started with Angeline King and considered other writers similarity in relation to her, it is Fiona McDonald who’s orthography is the most similar to all other writers on average - the spellings that she uses are the most common spellings used by other writers.
Whereas Belfast City Council’s Leid Strategy document uses the fewest spellings shared with other writers.
Forensic analysis
A more forensic comparison of “Ian Crozier”’s writing and HRConnect reveals some differences:-
From looking at the left and right wings of this Euler diagram, we might find several contrasting word pairs between the two writers:-
YEIR / YER
ES / AS
BES / BAES
HES / HAES
ALLOOS / ALOO (note the double L)
These are relatively common words, and might assume that a single writer would spell them consistently.
Instead we have two different writers who are trying their best, but who aren’t familiar with authentic Ulster-Scots (or Scottish Scots) writing.
The Governmental Orthography
In theory the huge difference between the “grassroots” Ulster-Scots orthography and the governmental “faceless bureaucrat” orthography could be explained as a difference of register and subject matter.
Except that the various governmental documents the plans, reports and booklets use different orthographies to each other. Even where similar orthographies are found, like Ian Crozier and HRConnect above, there are small differences in orthography and spellings that wouldn’t be there if they were a shared orthography in the same way that the poets, novelists and academics share an orthography.
Solutions
Years ago, in the early 2000s the North / South Language Body was funding the creation of a new Ulster-Scots dictionary. This still hasn’t come to fruition, twenty years later.
Perhaps what is needed, instead of a comprehensive, all-encompassing dictionary, merely a wordlist for state Ulster-Scots writers, containing the conventional spellings for three hundred or so words, such as HEIRSKIP and MEENISTRY, with their English translations where necessary. Maybe verb conjugation tables for BES, MAK, and TAK
In the Scottish Scots language, I have compiled a frequency dictionary, available via print on demand from here, which lists all the spelling variants of the top 2,500 most common words, so people can choose from themselves which spellings to use.
I’m not going to do a similar exercise for Ulster-Scots, because its a load of work, and the Ulster-Scots Agency, the North South Language Body are very well funded and should be doing this sort of thing themselves in conjunction with friendly universities in Northern Ireland. English chumps like me shouldn’t be involved.
Conclusions
From looking at spreadsheet with its green smears and red columns, we can see that authentic Ulster-Scots writing from named poets, novelists and academics is all quite similar to itself, its somewhat cohesive. Whereas the government documents from nameless writers is all a bit weird, a bit different, unconventional spelling choices are made.
There’s internal logic within each document, Leelang lear bes a heid pairt o tha spaesicht o this Roadin might mean “Life long learning is an important part of the vision in the strategy”, but even well-read Ulster-Scots readers will be scratching their heids, and reaching for a dictionary.
Official documents shouldn’t be weird, even if they’re written in minority languages.
When there is government expenditure on Ulster-Scots, government documents are held up as examples of Ulster-Scots writing to people who aren’t literate in the language, but they are the least representative styles of Ulster-Scots writing.
In some respects I have created a “Torment Nexus”, using computer wizardry to analyse Ulster-Scots texts and to find writers with similar styles. Just to be clear, I’m pretty sure that no AI has been used, this is all hard labour and rattling through spreadsheets
PS - Have found a other documents written in the governmental orthography of Ulster-Scots - “Ettlins fur the Ulster-Scotch plon fur oor leid, heirskip an cultur” - on first scanse it strikes me that no one else in the entire Scots corpus has used the spelling PLON as the Scots version of PLAN, in other Ulster-Scots documents it had been spelt PLEN