So this is going to take some explaining. If you follow the Walmington-on-Line Twitter stream @walmingtonon you may have seen us trailing something called the Dads Army Genome #dadsarmygenome.
What is the Dads Army Genome?
If you’ve seen the BBC Genome project*, you’ll know the Beeb have been busy scanning all back copies of the Radio Times and digitising those historic listings. If you’ve plenty of time on your hands you can find 55 pages of Radio Times listings for Dad’s Army! And that gave us what, in another programme, might be called a cunning plan.
Suppose we take every single Dad’s Army script, scan them, break the down into their component parts and put them all into a database? Then we can answer all sorts of geeky questions that would only otherwise be found by hours or days of patient reading and cross referencing. Want to know which catchphrase is used most often? Or when Pike first said I’ll tell mum? Ever had a burning desire to know whether Frazer or Godfrey had the most lines, or which character swore the most? Well this is your best chance of finding out unless you have some serious time on your hands.
Who on earth would do that, and why?
I would – and really only because (a) I’m slightly sad, and (b) I’ve got a background in data management and analysis so it’s the sort of thing that interests me (see a). And actually, I think it is quite an interesting project – to see if it is even possible. In recent years the technology to process and analyse text information has become really cheap and sophisticated – so lets see how it gets on drilling for comedy gold in one of the biggest and best-loved set of sitcom scripts available.
How are you going to do it?
Well it’s early days, so here is how it works at the moment (but this could change):
- Slice the pages of each episode out of a pristine copy of Dad’s Army The Complete Scripts with a razor blade – trying to keep the pages a nice regular shape for scanning.
- Run the pages of the script through a flatbed scanner, and turn them into PDFs, which can then be loaded into Google Docs which, miraculously, converts them into readable text than can be copied and pasted line by line (and it’s free too – amazing).
- Now it gets dull – each line of dialogue has to be copied individually, pasted into a database, associated with the correct character. In a similar way filming instructions and scene blocking are copied and flagged – so we can distinguish spoken lines from instructions.
- Alongside that we are compiling information that applies to each episode: dates, filming locations. Also biographical information for each character, and each actor (some of which played more than one character, which makes life complex).
- Once all the information has been logged we can start to analyse it using simple tools (E.g. line counts) and more complex techniques (E.g. topic tagging). How else could you find out that in Museum Piece, George Jones (Jonesy’s dad) has more lines than any of the platoon privates, though Walker runs him close? I told you it was sad.
When will it be finished?
How should I know, I’ve only just started! At the time of writing I’ve completed two and a half episodes of the first series. I originally estimated each one would take half an hour – it’s taking closer to 90 minutes and needs plenty of breaks to avoid eye-strain (wish I’d done this when I was twenty and could see). Anyway, once the scripts are scanned that is just the start – the analysis could take years. And then I have to think about all the material that is less easy to get hold of – the scripts of the specials, the radio series, the musical. What about the films – one of them hasn’t even been released yet. This could go on forever!
Aren’t You Breaching Copyright?
I’m no expert but I don’t think so – I’ve got no intention of publishing the scripts or any other material other than short snippets which I believe are allowed as ‘fair dealing’. But I have no wish to tread on anyone’s toes so if you think I am infringing your rights in any way then let me know and I’ll stop it at once!
Well if it ever gets that far I’ve got some wild and improbable ideas for this: how about an artificially intelligent Captain Mainwaring you can chat to online? Or entirely new Dad’s Army scripts made up from lines of the current ones. All right, it probably won’t happen and if it does nobody will want it – but that’s what they said about the automobile and the computer – and betamax video.
How can I find out more about it?
Keep visiting Walmington-on-Line. Or keep an eye on our Twitter stream or Facebook page. If you’ve got any (sensible) questions you can reach me at any of those pages and any distraction from scanning pages of Dad’s Army scripts will be welcome.
Can I help?
Yes, of course you can – do you really think I want to do all of this myself! Just get in touch (see above) and let’s talk about it.
That’s it – PLATOON, stand at EASE!
- Just to be clear – the Dad’s Army genome project has absolutely no connection with the BBC or their BBC Genome project.