Thursday, May 20, 2004

Preliminary Observations

Admittedly, I'm dealing with a small and grievously flawed input set, but I'm not optimistic about this new system's preliminary results. Here are ten better examples:


fix successful alleys of think!

bad tar!

really all the universities behind mean of we and behind somewhat waved ready-made cs around as well, so computerdom hook somewhat other bore spend yesterday saw up of try or something.
maybe wave instead sweet.

sometimes the symbols at the end bending knew universal.

but anymore would probably bright think past go of glowing funniness and summers with high badness.

grunt humor of ford.

but anymore were too ill single past winter of risen staffs and urger into big badness.

coolness missed the tar whom thought in me behind the note silent with the confusing hull hull that put much ford.

keep up my city?

bitter-ender sought the humor you come behind whom past the man electric with the confused friend beck that set backward humor.


Admitting that there were strong grammatical flaws in the original posts... confessing that I haven't even begun to reconstruct the old topicality system...

It's still flunking... The sentences of my samples at least avoid absolute non-sequitur. The idea behind this system was to find a way to evade the teaching of grammar without losing that basic coherence... but even using informal linguistic structures, we're not hitting anything near the mark of basic intelligibility.

I still think "charity" is the key to this project. The real challenge isn't to build a robot that is convincingly human. Too many humans aren't convincingly human. The idea is to build something that people WANT TO BE HUMAN and thus are willing to grant the benefit of the doubt when it fails to make sense...

Anyhow, rather than throwing this whole version out, I'm suspecting that there's a hybrid solution available...

Specifically...

Presently, we go to the dictionary and assign each word a "Part of Speech" value... we've designated 18 "parts of speech"... nouns, transitive verbs, intransitive verbs, adjectives, adverbs, conjunctions, interjections, prepositions, pronouns, articles, abbreviations, typos, geographical names, biographical names, conjugations/declensions, attributive nouns, trademarks/service marks, and verbal auxiliaries...

The idea behind this was that each word could be boiled down to a binary coding indicating its structural role in the communication... Certain unique words (such as "the") would acquire very distinctive markers (there are precious few words that function as articles, so add in the ancillary functions of the word and it's likely unique)... meanwhile "generic words" - exclusive nouns or words that are either simple nouns or simple verbs, would be largely interchangeable. Then it would just be up to the topical association system to create the illusion of coherence...

But instead, we find that even within the "odd categories" there are words that, in any given sentence location are not semantically interchangeable...

Let's take a sample case at random from the valid sentence structures. (I excluded all sentences containing words that had an indeterminable Part of Speech (like Tarzan (remember, I'm using an existing dictionary)) or lacked a definition (like "argh")).

One sentence begins with a word that is coded 000111000000000000. That probably doesn't make any sense to you. But what that would tell me is that the word is either an adjective, an adverb, or a conjunction and that is has no other legitimate functions in the english language. In this case, among the word set that I currently have (vocab = 753 words) there is only one word that meets these criteria - "only." Thus, any time this sentence structure is deployed, it will begin with the word "only."

The second word in the sentence has a value of 111000000000000000. This means that it can be either a noun, a transitive verb, or an intransitive verb. In this case there are 69 legitimate words among my present vocabulary. This original sentence must be from the Tarzan post. And it was "only talk in grunts and computer code."

I suppose as an isolate, it doesn't make much sense either.

But various options for the first two letters of this nascent sentence are now "only wait" - "only know" - "only bark".

Let's add in the third word. In computer-ese this word would be described as "100110010000000000". That means that it could be a noun, it could be an adjective, it could be an adverb, or it could be a preposition. I find three words in my database that meet that criteria: "in" "behind" and "past" (??? dunno why 'past').

So. Let's take the original sentence fragment. "Only talk in". "Only wait in..." - that's sensible. "Only know behind..." That's a non-sequitur. "Only wait behind..." That would make sense... "Only wait past..." "Only bark past..." "Only know past..." I guess the third would make sense... let's go visit the dictionary and double-check "past". Maybe I'm still not parsing dictionary entries properly (though "know" simply isn't going to fit with any of these prepositions...) Yup. "past" is a preposition meaning "beyond the age of" which puts it in the same category as "in" and "behind." Which is not a category based upon prepositional function, but upon overlap with other parts of speech... all three are also adverbs and also adjectives and also nouns. "My behind." "I got an in." "He's got a seedy past." "He's behind us." "I'm past that." "I'm in luck."

Meanwhile, "I have quite a wait ahead of me." "I'm in the know." "I need to give him a good talk." "The tree has bark."

Grr...

This isn't going to work, is it?

A category like "only" is so restrictive that nothing more than a few largely interchangeable words could ever hope to fill it.

But a category like "talk" is so large... there are so many verbs with both a transitive and intransitive sense that are also used as nouns... and you can't just drop any one nt a sentence structure interchangeably...

I was thinking that the solution might be to somehow "fix" prepositional and other "high-function" words and only use the interchangeable "grammemes" (that's what I call those binary strings) for the noun/verb/adjective/adverb simples...

But it looks like it's the SIMPLES themselves that will knock the whole project off the track. The cognitively functional words are far more semantically vacuous than the primary signifiers... they're operating on some other plane yet. And unless topic can be tied directly into structure, I don't think I'm anywhere near it as of yet...

0 Comments:

Post a Comment

<< Home