Let's Make Robots!

MDIBs - Metadata Driven Internet Brains

I'm always asking myself, "What is logical for how to build a smarter bot with today's tech and the tech soon to come?"  

Currently, I come to the following conclusions.  I realize that each point I make is hugely debatable, I’m just putting out some opinions, not trying to prove anything.  This is the course I am on for my currently, so I thought it might stimulate fun discussion.

1.  A bot can't know everything, so at some point a bot will need to "look up something" on the internet.  Likely, a bot will need to look up many things at the same time, or do many things that involve internet resources.

2.  I believe the main "brain" of smarter bots should be physically "off bot" and "on the web" for small and medium sized household bots that have internet connectivity.  I used to really want everything to be on a bot, but I come to this conclusion for performance, economic, and reuse reasons.  

 Performance:  A bot can call its "Internet Brain" once, and the "Internet Brain" or IB, can call many other web services/resources as needed, in separate threads, before figuring out what to do.

 Economics:  Bots that have to carry the power and weight of "big brains" will be bigger and more expensive than most people would like.  I’d personally like to have 3 or more bots per household, so they need to be affordable, and smart.

 Reuse:  Should bot brains be custom builds?  I don't think so.  I believe brains should be reused.  Until we figure out how to better share/leverage software agents and develop some common concepts/interfaces/etc, we will all be building bots that aren't as smart and useful as they could be.

3.  Bots should not wait for or expect to get an answer as to what to do to any given circumstance immediately.  Basically, things should be asynchronous.  This means bots should make a call to an IB with something like "Is it going to rain on Tuesday?" and then call again a fraction of a second later to see if an answer is waiting.  A mechanism for a server to call the bot when the answer is ready would obviously be better.

4.  Bots will have different sensors, actuators, behavior, etc.  This means Internet Brains (IBs) will need to support many different configurations.  I will refer to this as "Metadata Driven IBs", or MDIBs.  It is logical for this metadata to exist on the internet and be maintainable by robot builders through an app of some kind.  It would be very helpful (but exceedingly unlikely) if standard concepts and structure could emerge for this metadata.  There would be a huge amount of this metadata and many different substructures.  (Instead of waiting for these standards which will never happen, I will probably just come up with some.  Why not?)

5.  People will want to interface with their bots through various devices while not physically around them.  They may want to see an avatar of their bot, onsite images, video, maps, or sensor data through phone, web, tablet, etc.  These maps might be consolidated “views” on multiple bots/sensor data, like home automation data / internet of things stuff.

6.  Bots that are owned by the same person should be able to share data so as to increase their “situational awareness” of a given location.  The internet of things should be tied into as well.  This should be a function of the MDIB.  Any bot in your house should know whether the front door is locked, what the thermostat is set to, whether there is motion at your back door, or a flood in your basement.

7.  Complex rules should be able to be built on the MDIB coordinating the home, its sensors, and one or more bots.

8.  If a MDIB is a “black box” that is configurable, useful, and interoperable, then robot developers do not really need to know or care what technology was used to build it.

9.  While MDIBs should run “on the internet”, they should also be able to be extended and customized back “into the home” by supporting some common interfaces and being able to call internet resources that homeowners put on their own computers.  This means developers/homeowners should be able to build intelligent agents, register them with the MDIB (Metadata Driven Internet Brain), configure them through the MDIB app, write and publish their code on their PC or other device, and then have the MDIB start using their custom agents when appropriate to do so.

10.  What is the role of microcontrollers in this model of robot building?  Robots still need an on-board brain.  This brain needs to directly handle sensors which are timing sensitive activities (like sonars, gyros, etc.), actuators, motors, etc.  This brain will need to handle “reflex actions” like obstacle avoidance, and be able to call an MDIB for higher level less time sensitive brain functions.  A unified “Dictionary of Commands” will need to be devised to robots can communicate with MDIBs and implement commands given to them.

11.  How should data intensive sensor processing (like video processing) be handled in this model?  That is an open question.  I suspect a hybrid approach with most of it being done onboard and some “interesting” frames being sent to a MDIB for additional processing (object recognition, localization, face recognition, etc)

The next question is “How should a brain work?”  

To me, that is an unsolved problem.  I ran into a quote again today that reminded my own efforts and deserves repeating: 

What magical trick makes us intelligent? The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle. – Marvin Minsky, The Society of Mind, p. 308

My efforts at a brain thus far can basically be summed up as a collection of services and software agents that use different techniques to accomplish general and specific tasks.  A service figures out which agents are applicable to the circumstances and executes them.  When all these agents are done, the service arbitrates any conflicts to determine what desired behavior gets sent back to a robot. 

Given this concept of a brain (which might not be a good one, but lets run with it for the sake of this point) I think it is quite easy to visualize the “Society of Mind” concept as an MDIB.  If a configurable brain is built as a collection of agents running largely independently of one another, with all the configuration stored as metadata and maintainable through an app, many robots would be able to easily share agents/code/metadata/knowledge.

As new agents or new versions of existing agents are written and introduced into the collective MDIB DNA, some robots might use them, others not.  I can only guess that robots would get a lot smarter, a lot faster, at a much lower cost, with much less code.


What do you folks think? (other than the obvious, that I just wrote a manifesto)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Made some progress on NLP, someone ported OpenNLP to C#.  I have the source for it and its version of WordNet installed and running.

It has a lot of basic function that are far better than mine that plan on incorporating.  I am really hoping that none of these pieces will slow the overall brain down too much.

Split paragraph into Sentences

Split sentence into tokens (words and symbols)

Determine Part of Speech of each word using stats and maximum entrophy algo.

Finding people, places, dates in the sentences.  (I'll still need to do mine I think, so I'll do both)

Determine structure of sentence(s), full NLP parse into a Tree.  Even if I don't use the full parse for meaning or mapping to agents/actions, this parse is a very useful annotator for determining tense, plural, etc.

It would also appear that it can interpret the WordNet data and get all kinds of "Is A" and "Has A", many thousands of them, that I had taught Anna previously through speech or web page.

There are a lot of other features here, just getting going on real NLP.

The full parse trees are impressive, now trying to figure out how to use them to do something useful.  My regular expression stuff was simple and worked, determining structure and meaning in one simple step.  Should I start doing a new type of pattern recognition based on NLP Parse Trees?  I would think yes, seems like it would be complex and slow though, so my thought was to do it only if I didn't get a sufficient match first using my existing techniques.  I am trying to figure out the types of statements that would be useful to a robot that would lend themselves to recognition with "Trees" rather than direct sentence matches or regular expressions.  I would have to come up with a way to match parse trees based on search criteria...which might circle back around to regex.

I'll worry about that later I suppose, uses for this will emerge.  Happy to have some new tools.

I don't know how fast OpenNLP is compared to your algorithm. I would try getting a parse tree to see if that can get you to the core of the sentence. It might be more accurate than your routines, especially if you can find a way to keep the modifiers (numbers, etc) as annotations or whatnot.

Perhaps each word could be an atom, or maybe just the core of the word. A sentence would be a graph that connects the words in their proper order (and also be an atom). Perhaps the proper order is as a parsed tree, I don't know.

Thanks for thinking about this Jay,  I believe we are on the same page generally.

This is my latest over-simplified pseudo-code of how to fit my prior code and OpenNLP together:

Sentence Comes In as Input:

1)  Calculate "Normal Sentence" with "Annotations hanging off of it".  There are a few versions of the sentence held in different structures for various agents to use as they best see fit.

1.1)  Determine Tokens - Open NLP

1.2)  Do OpenNLP Parse - The first sentence takes a second to warm up OpenNLP and WordNet but after that I barely notice a speed difference if at all.

1.3)  Group Words Into Phrases - My stuff

the OpenNLP tokenizes into words, but my own routines recognize more multi-word phrases like private people, learned phrases, countries, states (New Jersey) as single concepts, which is good.  Maybe the OpenNLP tokenizing could be rectified by modifying the WordNet db that OpenNLP uses when tokenizing.

1.4)  Annotate Sentence with singulars, plurals, tense, adjectives and adverbs from NLP parse, and REMOVE all these words from the "Normalized Version" of the sentence that will be used for pattern matching.  This leaves me with the "Core of the Sentence" as you said.  This removal will have the HUGE initial benefit of making all my existing processes "match" when the input sentence contains adjectives, adverbs, etc. that the regular expression didn't expect.  I can make the agents where this "additional color" is relevant look for this color in the annotations when they do their individual processing functions.  i.e.  The difference between "Go Forward" and "Go Forward Slowly" is an adverb that would be an annotation.

2)  Find Agents that Match "Normal Sentence" in Some Way.  The following basically represents a prioritizing from "Exact Matches" that take priority over less exact "Pattern Matches".

2.1)  Look for Sentence in List of Commands (AtomList) and return commands to caller.  'Drive Forward"

2.2)  Look for Sentence in List of Chat Requests (AtomList) and return one of specified chat responses to caller - this is a random routine that checks responses against recent history so as not to repeat.  "Whats Up?"

2.3)  Look for 2nd Person Sentence in List of Questions (AtomList) and process accordingly if found.  (this is a topic unto itself)  "How old is Fred?", translates to "How old be you?", question is found and Fred's response to this question is found.

2.4)  Find Regular Expressions that Match Sentence and execute corresponding agents.  "Where is London?"

2.5)  Find "NLP Patterns" that match Sentence - must invent this if necessary.  I need a way to define a bunch of NLP parse expressions with wildcards, and then a way to determine which of those "NLP patterns" match the input sentence's NLP Parse Tree.  "How would you describe the fox?" - from the prior sentence "The quick brown fox jumped over the lazy dog" ...the answer "The fox was brown and quick."

A bit on atom storage.  I am proceeding with data conversion.  Words, Phrases, and Sentences are atoms.  The Keep It Simple Stupid part of me stores sentences as strings rather than some ordered list of IDs to other Word atoms.  Maybe when I finally get the Graph/Hypergraph concept in my gut I will change my mind.  Is a string/word/phrase/sentence just a graph of ASCII characters anyway?  I don't see storing sentences as strings as a problem right now and there are many advantages, as I can always lookup a word atom based on its string key or its int key.  Its a whole lot easier to comprehend what's going on in the system if I can see words, phrases, sentences, etc. as strings that I can personally relate to.  I have been thinking about storing a full NLP parse of each sentence in another column, thinking that maybe I could then use SQL and LIKE expressions to do the "NLP Pattern Matching" process described in 2.5 in some way.

Try as I might, I can never write short posts on this topic.  I hope this though train helps some future hobbyists out there.


Anna was already impressive. Now she will be more impressive. I can't wait to see what you develop from all these pieces.

I really like this idea, though there should be two types of them: robot gestures and human gestures (or meatbeing gestures, because cats and dogs make such things also).

Location and approximate location could be two more types of atoms or one atom with appropriate truth values. And then there is time, which might be an atom, but I think it fits better as an attribute of atoms. Arrrgghhh! Now I'm seeing atoms everywhere!

Thinking about human gestures makes me wonder how best a robot should think abstractly about humans. I think some sort of tree structure that could be easily assembled into a stick figure might be best. I remember reading a long time ago that stick figures are one of the better candidates for how humans think abstractly of humans.

Allowing gestures, which are collections of atoms, as a single atom, means that AtomSpace needs to be a hypergraph.

I can see the recognizing human gestures with stick figures using Kinect.  One of the middleware libraries for it outputs stick figure representations of some kind.  We do some of the XBox dance stuff for fun in our house during family gatherings, the newer ones can recognize at least 4 people/stick figures at the same time.

It will probably be a long time before I can take that one on, like to crawl before I run.  My thought of crawling or walking would be to tackle "Robot gestures".  I would start mapping emotional context and gestures to responses.  An example would be, when the robot says "No", a simple less happy emotion and facial expression, a looking slightly down, and a slight horizontal head shake would seem good to synchronize.

I was thinking about the more prose-like areas of the web, and blogs along with some forums came to the top of my brain. This is something I wouldn't try on the default brain, but a copy. There are way too many untrustable facts in blogs and too many opinions masquerading as facts.

I will help you with your brain as much as possible. I can test, I can code, I can build. As long as Lee doesn't need the time first. I look forward to seeing it.

Be careful about having too many types of atoms. It's probably very easy to do now. On the other hand, most of the math behind AI assumes typing, so you need different types of atoms. I wish I knew how many were ideal.

I also feel strongly that you should look at the OpenCog TruthValues. They have three different types (though one is just a list of other TruthValues). Reading their glossary and blogs (even though the math/philosophy is beyond me) I'm getting an idea of some of the ways these can be used. The TruthValues are very useful in filtering out crap. And they are essential to making inferences from old data to new data.

There are always going to be problem areas when you're dealing with the real world and natural language.

The whole "birds can fly; penguins are birds; therefore penguins can fly".

A real artificial general intelligence (AGI) has to deal with exceptions and also must have some non-Boolean TruthValues in it. For example, the statement "birds can fly" is a bad statement if told as 100% truth. Think of baby birds, think of birds with broken wings, and yes, think of penguins.

As for hard drives, they are inexpensive nowadays. I have a 3 TB USB 3 drive that I have free at the moment and may use for something like this.

I love thought experiments.  Einstein thought they were useful, and he turned out ok.

I will look into the truth values and see what they are doing.  I haven't gotten that far.  If you figure it out and don't mind summarizing, that would be great too. 

I dealt with the exceptions thing when writing the initial algos for Anna.  Anna doesn't assume something is true just because she has one truth statement saying all.  She does look for exceptions if it seems relevant.   I wouldn't be surprised if she could be tripped up though.  "Can birds fly?" can be answered by a bot in the general sense saying "Yes" or "Most can", even though the bot knows of specific exceptions.  If you asked "Can all birds fly?", or "Can penguins fly?" that's getting a bit more specific, and deserves a more precise answer.  Anna can handle some of these currently, the others are fairly easily doable with a little more coding.

I had a concept of quantifiers...where words implied a value or probability.  It seemed useful at the start, but I haven't touched it since I wrote it (maybe it is working).  The words mostly, some, a few, probably, seldom, never, always, etc. strongly imply probabilities.  Ordinals were also quantifiers, so "First" implied 1, and so on.  These could be used to sort things or to predict a truth confidence.

The number of atom types might seem artificially large as anything that has a limited set of values (something that would have a table with an int key and a varchar description), basically a list, could be an atom type.  Not all lists, just lists relevant to the brains internal functions, like my list of software agents, or a list of emotions.  Each of these could have additional data added to them in the future though.  Setting these up in the meta will allow the UI to show english readable descriptions of the info that is in each atom, and allow data to be changed using dropdown lists.  Otherwise, looking at the atom would mostly look like a bunch of integers and a few strings...a confusing matrix of gibberish.

The biggest foundational "weakness" between what I have thus far and opencog and other natural language systems is that I'm not truly using any of the academic theory on NLP with the parsing of sentences into NP, VP (noun phrase, verb phrase), and all the other symbols they use into a tree using probabilities.  It may be possible to add this later but it could be a real mess to not figure out how to fit it in now.  I mostly use a set of home grown techniques (as I have previously described) that amount to "feature extraction" or "question-answer translation" where some degree of implied meaning and structure are derived at the same time through regular expressions or data.  A regular expression of "Where is _____?" (and other similar expressions) can be mapped to an agent that determines location.  Normalizing the sentences first allows the system to handle some variations in the text, like "please" being in the sentence.  The system is very practical and easy to understand, but it is NOT NLP.  The huge weakness of this is exposed by trying to use anything other that simple sentence structures that the bot understands.  Anything with multiple "phrases" or several adjectives or adverbs will not be understood by this system.  I have read about NLP, but they get so into the trees and symbols, that I never get how to get practical "meaning" from the tree so I can map it to an agent that can do something.  If I could get how this might be done, it would be a huge leap forward.  

A note on gender differences...Sadly, as soon as humans start forming sentences with multiple phrases, I often lose the point of what they are saying.  Sometimes it is because the speaker lost the point and forgot to include it in the words, hinting, implying, etc., and telling me later that they told me something that was not in the parse.  Some would say its a mars venus thing.  As a guy I get the mars thing, but I'm not sure how a robot is ever going to parse through the venus thing.  Will I need a separate set of "Venusian Parse Algorithms"?   Food for thought.  Maybe they figured it out with Jibo.

One of the things that cognition algorithms are supposed to do in OpenCog is to reward the atoms that "helped" the algorithm come to its conclusion. It does this by adding to some attention value associated with each atom. A cognition algorithm has a limited amount of "attention" it can give to atoms, so it's a form of currency.

Truth values seem to get stronger the more they are used. There are different ideas on how TruthValues should be used.

Both of these are in the BrainWave blog that can be accessed from the main OpenCog page. Unfortunately without more experience with the system I'm as far as I can get. The programming I understand, but once it starts towards philosophy I can barely follow along, but if I lose my guide I certainly can't find my way back. :(

I just installed Julius on Groucho. Julius is a speech-to-text processor for Linux and also Windows. There is a partial English word model that can be used as as soon as I can find a microphone I'll give it a whirl. There is also sphinx.