Thursday, July 08, 2004

Categories

What are our objects?

Essays --> Paragraphs --> Sentences --> Words --> Syllables --> Letters

Now what are potential rules between various objects?

Object --> Relation --> Object

Essay --> Paragraph.Count
Paragraph --> Sentences.Count
Sentence --> Words.Count
Word --> Len(Word)

Syllables and letters are more complicated... Syllables aren't exactly units of pronunciation... I'm thinking more along the order of "internal repetition of structure"... spotting frequent prefixes ("an", "de") and suffixes ("ed", "ing").


Sample

Not much to look at, but this is what we get from a letter-based attempt to define rules... for any letter what is the likelihood of any given letter following it? It's a very rough prototype, but it does seem to yield reliably "realistic" patterns of letter correspondence...

instha t d realize nd wh An outhIoupe of tatss laci wat nchous, onthathepons Als. fond f Medacarartyomec Athous, Cioremend din Gous Ateneen Are r o trofas tre bily fath S. lp oonstit An Ph iS tinobuso

Con; thergean Aldge ina tithr ils Aiasstat S ounthnceco e ofin junthashaly Ses e An ond Angof thor wind At ty Pessiofuctialinitiofone oy iomar Gamictalerve wesucourelurd of onde Aly oregrsimplac o ioupon Abutsinf toflangure ofor. Peen Pins in Ase taripicenthegrty tilighedet-gl-hamelieingere rs Pre tl Plex Alan. Prysurutene berl SthIch Prt Pprea ntit Cany M Spoun Pre foust Stesofof of Co izag

Rules.

Hmm.

Hmmm....

I've been trying to capture enormous amounts of data, then to scour the world looking for the rules to process it. This is hard, and this isn't really how people think.

Take induction. "All emeralds are green." Now, we assume this is true because every emerald we've seen has not been green. We don't necessarily hold it to be a law. We may just assume it's a "rule". So, if we find an anomalous instance of a blue emerald, we keep the rule, but adjust our confidence in it.

Now, a logician argues that the rule is derived from the observations of emeralds. We observe one emerald, it is green. We observe another emerald, it is green. And so on, and so forth, and since we know that all emeralds we have seen were green, we assume that the rule "all emeralds are green" is reasonably valid.

BUT, we don't REALLY remember our encounters with MOST individual emeralds. We see an emerald, increment our confidence in the proposition "all emeralds are green" and then promptly forget the particulars of the instance. I can only specifically recall a very few encounters with emeralds, and yet I am confident that they've all been green. My confidence in the rule is held apart from my recollection of the instances which have validated it.

So, is there some way to consider rules apart from the particular components that constitute them?

Let's take the case of an essay Seductotron were reading. The essay would be an object. It would have component pieces called sentences. Those pieces would have component pieces called words. Words would have lateral relations (to one another) and vertical relations (to higher levels (sentences) and lower levels (letters) of structure). We could say of any observed truism that it was potentially a rule. Say it occurred that all sentences beginning with "who" ended with question marks. That would be a true observed proposition. We could then forget the sentence but remember the proposition, and, if the proposition were subsequently validated with some frequency, we could then keep it as a rule. If it never recurred, we would want to toss it in the rubbish heap.

Given any complex observed object, the sum total of potential rules to be derived from the observation is exactly equal to the sum total of true propositions deducible from the observed object. The utility of any potential rule would be derived from the probability of its corresponding proposition's truth.

Recently, a friend brought up the instance of Helen Keller. She was a woman who could neither see nor hear, yet she learned how to speak. From this, it seems reasonable to conclude that communication can occur without any familiarity with the particulars. But in order to learn how to speak, a non-random sequence of experiences had to be forced upon her again and again until she began to observe patternistic relations between the non-random sequence of experiences. Same as how anyone else learns to speak. You keep repeating the same damn thing at the baby until the baby begins to understand the categorical relations of the things you've said....

This is all rather inchoate. I'm trying to articulate it for myself.

Let's go with a sentence - "Pudding tastes best in the morning."

Now, for the word "pudding" we can observe several true propositions.

"Pudding" occurs at the start of a sentence.
"Pudding" occurs directly before the word "tastes"
"Pudding occurs two words before the word "best"
"Pudding occurs in the same sentence as the word "tastes"
"Pudding" occurs in the same sentence as the word "best" (and so on)
hmmm....

This needs fleshing out. There's a thought under all this junk...

Friday, July 02, 2004

WTF?!?!?

shhh...

PartOfSpeechList Property Example

This example checks to see whether the thesaurus found any meanings for the selection. If so, the meanings and their corresponding parts of speech are displayed in a series of message boxes.

One potato, two potato, three potato, four. Five potato, six potato, seven potato, more. Here we go…

Set mySynInfo = Selection.Range.SynonymInfo
If mySynInfo.MeaningCount <> 0 Then
----myList = mySynInfo.MeaningList
----myPos = mySynInfo.PartOfSpeechList
----For i = 1 To UBound(myPos)
--------Select Case myPos(i)
------------Case wdAdjective
---------------- pos = "adjective"
------------Case wdNoun
---------------- pos = "noun"
------------Case wdAdverb
---------------- pos = "adverb"
------------Case wdVerb
---------------- pos = "verb"
------------Case Else
---------------- pos = "other"
--------End Select
--------MsgBox myList(i) & " found as " & pos
----Next i
Else
----MsgBox "There were no meanings found."
End If