by Edda Frankot
An important project milestone was reached last month when the last words of volumes 1-7 were transcribed on the afternoon of 18 January. The transcription of the first seven volumes, up to the year 1501, is now complete. Over the past eighteen months or so, the project’s two research assistants, Claire Hawes and William Hepburn, with a small amount of assistance of yours truly, have transcribed 4027 pages – no mean feat!
This does not mean, of course, that the project as a whole is now finished. The checking of the transcription and annotations is still in full flow. Once that is completed a final phase of getting the corpus ready to go online will commence. In the meantime, thanks to generous additional support from Aberdeen City Council to enhance the project, Claire and William have begun the transcription of volume 8. This volume will at least partly be transcribed traditionally, but there are also ongoing investigations into the possibility of having this book machine-transcribed for us by a project called READ. Watch this space for updates on that! Overall our final corpus will in part contain a level of annotation enhanced beyond our original specification.
Now that the transcription of volumes 1-7 is complete, it has been possible to do a word count. This count confirms our suspicions that volume 6 includes a relatively large amount of material, but also brings up some other fascinating facts. The total count as it stands now (this number will most likely change slightly during the final stages of the checking process) is 1,391,217 words. To put this in perspective: Shakespeare’s complete works total 884,421 words. A significant chunk of our nearly 1.4 million corpus (so far) is taken up by volume 6: 539,254 words (39%). By contrast, volume 7, which has 137 pages more than volume 6, contains ‘only’ 332,392 words (24%). On average, then, there are about 547 words on every page of volume 6, but only 296 on those of volume 7. The average across all volumes is about 300 words per page. The scribe of a large part of volume 6 used more of the pages (he only left one of the margins blank, rather than both), he placed his text lines closer together and appears to have written in a smaller hand. The volume with the lowest amount of words per page is volume 2, at only 189. This results from many blank spaces left between court entries, and blank pages.
Above: An illustration of different page word densities and lay-outs: ACR, 6, p. 752 (left) and ACR, 7, p. 508 (right).
It has also been possible to differentiate between words in Latin and in Scots (and those from entries in ‘multiple languages’, that is to say entries with a lot of switches between Latin and Scots, which typically occurs in lists of names). Overall 58% of the corpus is in Latin, 41.1% is in Scots and 0.9% in multiple languages. Two entries are in Dutch. In volumes 1 and 2 (1398-1414) only slightly more than 1% of the words are in Scots. In volume 4 (1433-1447) this rises to nearly 9%. By volume 6 (1468-1486) the division between the two languages is almost exactly 50-50, whereas in volume 7 (1487-1501) more than 68% is in Scots. Much more detailed research into this phenomenon is of course undertaken by our former text enrichment research fellow, Anna Havinga. Anna not only distinguishes between words and entries in Scots and Latin, but she also analyses the development of the language shift by year. But even the very coarse overview given here already throws up some fascinating first indications which future research will hopefully be able to elaborate upon.