Saturday, January 2, 2010

I'm Pretty Sure I Just Did Something Awesome


I haven't done anything today, unless you count the hundreds of words I've tagged on Wordnik.com in an attempt to create a foundation for ... something.

Basically, I go to an entry for a word. I look at how the word is spelled. I tag the word based on the pattern it has of consonants and vowels. Consonant is c, vowel is v. Panda is "cvccv". Panther is "cvcccvc". That's it.

The beauty of is is when you spend your whole day doing this, your job is basically to 'think of all the words in English" and then tag them with their consonant/vowel pattern. Some of the combinations point out obvious connections between fight, right, light, might, etc, but other words in the 'cvccc' pattern include birth, waltz, world, and sixth. Unexpected, neh?

I'm sure that a program could be written fairly easily to accomplish this task in minutes, but I had kind of a blast doing it manually. Actually, this blog is taking forever because every word I type here I want to go and tag on wordnik.

As I said, I'm not sure of the practical application of this. Making the connections may be the entire purpose. I mean, you can deduce that 'deer' and bead' have the same pattern of cvvc, (one of the most popular ones, and therefore one I have not been tagging as much). It's the strange combinations that seem worth making, and I'm happy to make this strong foundation. My favorite series of words is 'vccvcvcv' which currently has: advocate, antelope, envelope, escalate, and ominivore.

A while ago, I brainstormed about what online dictionaries could offer that paper ones cannot, specifically because of Erin McKean's very smart but understandable TED Talk where she explains the "ham butt problem." There could be so many more features than we are used to settling for. Wordnik already links to Flickr, and provides etymologies as well as popularity charts on the right side, explaining when a word has come in and out of the language. Pretty sweet stuff.

For some reason, I made an example page of what I thought an online definition page should look like, and included this function. In my example, it was apple, which it turns out has the very strange combination of 'vcccv.' It took a long time to match up any other words with that pattern but now there are 5.

Anyway... YOU TOO CAN ADD TO THIS NEW TAGGING EXPERIMENT!

Some caveats when tagging and searching for tagged words on Wordnik:
1. I spent a lot of time on this today but I'm only one person. Anyone can add tags by creating their own free little account. Please add them if you think they should be there.
2. I started out just doing 4, 5, and 6 letter words, eventually expanding out to 3-letter and 7-12 letter words. Therefore, if they are 4, 5, or 6 letters long, it's more likely there will be a larger database of words to compare it to.
3. For some reason today I'm having trouble deciding whether the Y in KEY should be a vowel or a consonant, so I tagged it for both 'cvv' and 'cvc.'
4. So far I haven't done any hyphenated words, but I'm sure they will fall into place easily.

6 comments:

connal said...

That sounds awesome. If you knew any programming nerds, this would be something that wouldn't be too hard to flesh out into a dynamic program pretty easily.

I think it could be awesome to search for different combinations and see what words fit each one (what fits ccvvcc for example?).

Ed Cormany said...

fun, but potentially maddening for linguists, who already use the C and V notation for consonant and vowel sounds, not letters.

literalminded said...

What Ed said. This could have been a cool resource for linguists, but its functionality is ruined by having e.g., "with" as cvcc instead of cvc, or "ate" as vcv instead of vc.

THiNK TaNK said...

Conn - you can find the ones that have already been tagged by going to wordnik.com/tags/ccvvcc.
Just input whatever combination you're looking for in the last part of the URL.

Ed & LiteralMinded - I see your point, and I've learned phonological rules using that system, but I don't know what I could have chosen instead. If someone else wants to implement something that represents the IPA versions of words, with 'th' counting as one phoneme, maybe the tag could be ipa_cvc

The whole point of this, as I wrote, was to find other words with the same pattern as apple, it was the spelling similarities I was after, not the sound.

crammer said...

how are you managing spelling variations in UK English, for example, color vs colour?

THiNK TaNK said...

Crammer - wordnik has a separate entry for each spelling of a word. if you misspell a word, an entry will still come up with examples of people misspelling it the same way on documents. Anyway what I mean to say is that color is cvcvc and colour is cvcvvc and there's no issue because they each live on a page of their own.