Pages

Wednesday, June 15, 2016

Under the hood

It's too long since I last wrote here. But that's not for want of progress. It’s just that the swan’s legs are thrashing around frantically but the bird itself (in this cygnal metaphor, which represents the progress of the next #WVGTbk as a swan) seems to share the vitality of the proverbial Norwegian Blue (strange convention  – it's nothing to do with a proverb, it's just a sketch) .

But one review of the first book found the percentages (in the section heads – e.g. “Sounds that represent the sound /e/ – N%” ) particularly interesting; maybe other reviews commented too, but not as clearly. And this made me reflect on the process of reaching those numbers. I wanted to make it more reliable and repeatable (a hangover from various brushes with ISO 9001 and the CMMI model during my last three to four years at Compaq/HP: a memorable catchphrase from that time was “If you can’t find the time to do it right, how’re you going to find the time to do it over?” – I say over rather than again because that’s how I first heard it [and its muscularity strikes me as worth preserving], although later speakers often felt it necessary to translate the American English).

So this time I'm showing something of what goes on under the hood as they say (without regard for the British English preference for the word 'bonnet' in that context (the naming of car parts) – under the bonnet might suggest Lizzie Bennet keeping a secret).

In this spreadsheet I've tried to quantify figures I've used in calculating which sound represents which proportion of *OL* words:



In the top three rows I calculate how many *UOL* words there are in my chosen source: the third row carries the search string I use to calculate general exclusions. The phoneme-specific sections follow (either 4 or 2 rows each, depending on whether I could think of a phoneme-specific exclusion in my dictionary search – for example, the string   &!*ology  always excludes words with l preceded by the sound /ɒ/).

The last ten rows are all about what I think of as my cosmological constant – including everything not so far accounted for. There's no rhyme or reason for it; it just makes things work. I call it the balancing factor.  In the last seven  rows I share out the balancing factor according to the proportion of the things I know about. (A fair bit of approximation goes on here.)

Well, there you have it. Back to the grindstone...

b