Identifying when parts of contemporary Scots were established
Or, Introducing Gilmour's minor corpus of Modern Scots Poetry
The other day on that Twitter some chap using the name Neil Swinnerton claimed that the writings of Robert Burns represented the “pure Scots language”.
If we briefly set aside Sonya’s point about linguistic purity, contemporary Scots as written today is what it is. Its neither pure nor bastardised, we can’t put it the genie back in the bottle, its merely the written form of a language as written by 1.2 million people.
My Corpus of 21st Century Scots Texts represented an accurate snapshot of Scots as it is currently written, in all its dialect forms. We might assume that the works of Robert Burns represents the Scots language as it was written in 1786 (although since his editors advised him to make it more English-like for the English poetry-buying public, we might consider it is not quite the same as other Scots writers in 1786)
There are some differences between 21st century Scots and the Scots of 1786.
There is also a statistics page for checking how balanced it is between centuries:-
This corpus of Scots poetry is a bit smaller than 21st century corpus, at only around 900,000 words, from just over 350 poets. Its a bit overweighted towards the 21st century, but there are a fair number of poets and words from the other centuries:-
1700 to 1799 - 16 poets - 67,557 words
1800 to 1899 - 43 poets - 254,179 words
1900 to 1999 - 66 poets - 88,788 words
2000 to date - 227 poets - 492,078 words
I’m not sure how many words are required for a corpus to be representative, but the graphs look right.
The website isn’t as comprehensive or as polished as the corpus of 21st century Scots texts, but its functional for the purpose of comparing spellings.
Here we compare NICHT and NIGHT
This graph shows that whilst NIGHT was the most popular spelling in each fifty year period up to 1849, after 1850 NICHT became the most popular spelling. This suggests that the idea of the -ICHT spelling as being typical Scots is only 200 years old.
Comparing DEAD and DEID shows a similar pattern
The DEID spelling became more popular than DEAD between 1800 and 1850. If we look at the poetry of the 1700 to 1850 period we can see that the DEID pronunciation has always existed in Scots, its merely the orthography, the written form that changed.
Whilst the DEAD - DEID and NIGHT - NICHT changes both occurred at about the same time between 1800 and 1850, the Scots language, as an entirely normal living language has been constantly changing over the centuries.
On the site, if you hover your cursor over each datapoint on the graph it displays the data behind that point - number of occurrences, normalised rate, and numbers of writers.
We can examine many pairs (or triples) and pin down roughly when the changes occurred (click on each link to go to the graph on the website).
Between 1900 and 1950
INTAE replaces INTO
WHIT replaces WHAT
TAE replaces TO
YERSEL replaces YOURSEL
WIS replaces WAS
Between 1850 and 1900
JIST replaces JUST (also look at JUIST)
Between 1800 and 1850
OOR replaces OUR
OOT replaces OUT
ABOOT replaces ABOUT
NOO replaces NOW
HOO replaces HOW
TOON and TOUN replace TOWN
DOON and DOUN replace DOWN
GOON and GOUN replace GOWN
RICHT replaces RIGHT
LICHT replaces LIGHT
NICHT replaces NIGHT
Between 1750 and 1800
Between 1700 and 1750
WEEL replaces WELL
WI replaces WITH
PIT replaces PUT
HAE replaces HAVE
GUID replaced GOOD
MYSEL replaced MYSELF
Pre-1700
BAITH rather than BOTH
ONY rather than ANY
MONY rather than MANY
MAIST rather than MOST
MAIR rather than MORE
ANE rather than ONE
WAD rather than WOULD
AULD rather than OLD
TWA rather than TWO
We might note that most of the changes listed above are all from spellings that Scots shares with English to a spelling that is distinct from English, and are taken from the top 200 most common words. I haven’t looked at spelling that have changed in the other direction from distinctly Scots to shared with English.
In some respects the written language has been changing to be more different than English since Burns’s time, even if the spoken pronunciations have remained constantly distinct. These changes didn’t happen all at once, to assemble the Scots language as it is written today has been a long process, word by word, with new spellings being taken up by more writers, and older spellings being left behind.
Its possible to change the resolution of the graph. As standard its set to have 50 year steps, but hundred year steps might be better for words that aren’t as common, of 25 year steps for the more frequently used terms.
Having decade steps makes the graphs look dreadful, but it does highlight the smooth decade coverage of the corpus.
Going forward I intend to seek out poets and poetry to fill specific gaps in the graphs, and try not to be tempted into gathering texts from other genres of writing. Its mostly copyrighted material from 1925 to 2000 out of copyright stuff from pre-1775 that I’m short of.
It would be kind of neat to be able to identify which specific poems in the corpus were out of copyright and then make these searchable and displayable, but the research overhead for checking the copyright status of each poem is beyond me.
(suppose a talented 10 year old had their work published in 1884, and then went on to live to the ripe old age of 80, their work would only just be coming out of copyright today)