Let's Make Robots!

MDIBs - Metadata Driven Internet Brains

I'm always asking myself, "What is logical for how to build a smarter bot with today's tech and the tech soon to come?"  

Currently, I come to the following conclusions.  I realize that each point I make is hugely debatable, I’m just putting out some opinions, not trying to prove anything.  This is the course I am on for my currently, so I thought it might stimulate fun discussion.

1.  A bot can't know everything, so at some point a bot will need to "look up something" on the internet.  Likely, a bot will need to look up many things at the same time, or do many things that involve internet resources.

2.  I believe the main "brain" of smarter bots should be physically "off bot" and "on the web" for small and medium sized household bots that have internet connectivity.  I used to really want everything to be on a bot, but I come to this conclusion for performance, economic, and reuse reasons.  

 Performance:  A bot can call its "Internet Brain" once, and the "Internet Brain" or IB, can call many other web services/resources as needed, in separate threads, before figuring out what to do.

 Economics:  Bots that have to carry the power and weight of "big brains" will be bigger and more expensive than most people would like.  I’d personally like to have 3 or more bots per household, so they need to be affordable, and smart.

 Reuse:  Should bot brains be custom builds?  I don't think so.  I believe brains should be reused.  Until we figure out how to better share/leverage software agents and develop some common concepts/interfaces/etc, we will all be building bots that aren't as smart and useful as they could be.

3.  Bots should not wait for or expect to get an answer as to what to do to any given circumstance immediately.  Basically, things should be asynchronous.  This means bots should make a call to an IB with something like "Is it going to rain on Tuesday?" and then call again a fraction of a second later to see if an answer is waiting.  A mechanism for a server to call the bot when the answer is ready would obviously be better.

4.  Bots will have different sensors, actuators, behavior, etc.  This means Internet Brains (IBs) will need to support many different configurations.  I will refer to this as "Metadata Driven IBs", or MDIBs.  It is logical for this metadata to exist on the internet and be maintainable by robot builders through an app of some kind.  It would be very helpful (but exceedingly unlikely) if standard concepts and structure could emerge for this metadata.  There would be a huge amount of this metadata and many different substructures.  (Instead of waiting for these standards which will never happen, I will probably just come up with some.  Why not?)

5.  People will want to interface with their bots through various devices while not physically around them.  They may want to see an avatar of their bot, onsite images, video, maps, or sensor data through phone, web, tablet, etc.  These maps might be consolidated “views” on multiple bots/sensor data, like home automation data / internet of things stuff.

6.  Bots that are owned by the same person should be able to share data so as to increase their “situational awareness” of a given location.  The internet of things should be tied into as well.  This should be a function of the MDIB.  Any bot in your house should know whether the front door is locked, what the thermostat is set to, whether there is motion at your back door, or a flood in your basement.

7.  Complex rules should be able to be built on the MDIB coordinating the home, its sensors, and one or more bots.

8.  If a MDIB is a “black box” that is configurable, useful, and interoperable, then robot developers do not really need to know or care what technology was used to build it.

9.  While MDIBs should run “on the internet”, they should also be able to be extended and customized back “into the home” by supporting some common interfaces and being able to call internet resources that homeowners put on their own computers.  This means developers/homeowners should be able to build intelligent agents, register them with the MDIB (Metadata Driven Internet Brain), configure them through the MDIB app, write and publish their code on their PC or other device, and then have the MDIB start using their custom agents when appropriate to do so.

10.  What is the role of microcontrollers in this model of robot building?  Robots still need an on-board brain.  This brain needs to directly handle sensors which are timing sensitive activities (like sonars, gyros, etc.), actuators, motors, etc.  This brain will need to handle “reflex actions” like obstacle avoidance, and be able to call an MDIB for higher level less time sensitive brain functions.  A unified “Dictionary of Commands” will need to be devised to robots can communicate with MDIBs and implement commands given to them.

11.  How should data intensive sensor processing (like video processing) be handled in this model?  That is an open question.  I suspect a hybrid approach with most of it being done onboard and some “interesting” frames being sent to a MDIB for additional processing (object recognition, localization, face recognition, etc)

The next question is “How should a brain work?”  

To me, that is an unsolved problem.  I ran into a quote again today that reminded my own efforts and deserves repeating: 

What magical trick makes us intelligent? The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle. – Marvin Minsky, The Society of Mind, p. 308

My efforts at a brain thus far can basically be summed up as a collection of services and software agents that use different techniques to accomplish general and specific tasks.  A service figures out which agents are applicable to the circumstances and executes them.  When all these agents are done, the service arbitrates any conflicts to determine what desired behavior gets sent back to a robot. 

Given this concept of a brain (which might not be a good one, but lets run with it for the sake of this point) I think it is quite easy to visualize the “Society of Mind” concept as an MDIB.  If a configurable brain is built as a collection of agents running largely independently of one another, with all the configuration stored as metadata and maintainable through an app, many robots would be able to easily share agents/code/metadata/knowledge.

As new agents or new versions of existing agents are written and introduced into the collective MDIB DNA, some robots might use them, others not.  I can only guess that robots would get a lot smarter, a lot faster, at a much lower cost, with much less code.


What do you folks think? (other than the obvious, that I just wrote a manifesto)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I agree. Just do it! Me, I'd probably spend too much time studying whereas you went and built and programmed Anna within a year! Doing is better most of the time, once you have some sort of plan. I look forward to seeing what you build.

Will you create a human-level intelligence? I rather doubt it, but that's not what we need. We need something that can pretend to be human in a conversation, as well as providing a bunch of other services. This is a wonderful idea.

However, I do see some problems.

Mainly these are web connection problems. For instance, Verizon randomly seems to drop my connection or slow it down. Also, despite having the wifi in a central part of the house, some parts of the house seem to lose the connection to my own in-house signal. At least that one has some fixes,

I would like to propose a slightly different emphasis on the layering scheme:

Layer 1: The robot

Layer 2: The home MDIB server

Layer 3: The main MDIB server

I propose that the connection between Layer 2 and Layer 3 be considered intermittent and probably slow.

This would put more of the load on Layer 2, assuming that Layer 2 can handle the load. This would mean that Layer 2 would probably have to handle the speech to text part, as well as at least a mid-sized dictionary.

In addition there is a possible problem if I took a robot outside with me, away from my wifi connection.

One solution for connection problems, at the cost of a slower connection speed, is to use a cell phone modem inside the robot. For example, I have one of the Adafruit Fona breakouts to experiment with in that direction.

In Groucho, I can put a full server and a 3 TB USB disk if I want to. And I can use him for the home server for the smaller bots.

Though to be honest, I might do better by using a separate server.

I have to run now, sleep is catching up to me now.

I put forth where I was headed with Version 1 of this idea, what you are describing in what I had in mind for Ver. 2.

I tend to agree with your assessment of the layering (having a home server), as thats the approach I'm taking at home.  Incidentally, I have used the bot successfully outside the home as well.  

I'd like to put forth a brain model and API where the majority of robot builders could get some experience with other brain functions without having to house their own hardware or understand the code, by getting familiar with the various metadata that controls behavior, through a website.  There will be some learning curve.

I will say I made some huge progress yesterday towards a "Universal Memory Model" that consolidates every table and structure I currently use (or can forsee using) for "Robot Memory" into a single table.  Every memory is some type of "Atom" or some association between "Atoms".  Each Atom (or Association of Atoms, which is an Atom as well) can have some data that rides along with it.  It allows the creation of new memory types (called Atom Types) within the same store.  I am really excited about this for three reasons:

1.  I can build a relatively small number of forms that will "adapt" and let me view and maintain anything that the robot knows or any behavior that is defined, basically any aspect of anything.

2.  This should facilitate "syncing" memories/behavior fairly easily from one server to another, thus setting up for Version 2, the home server you have described.  It could even be a CSV file exchange or something.  Syncing code will be another matter, but I believe I could store the code for the agents inside the brain as well...and then create the classes "on the fly" from the code at runtime.  I have some C# colleagues that have written books on the subject, so I think its doable.

3.  Because there will be little cost in time changing a DB or creating UIs to build out new brain structures/ideas, I will probably build a lot more a lot faster,  The hold up will be the one true holdup...thinking of good ideas.

While all this might sound undisciplined (for the record, it is) as I am breaking decades of data modelling conventions, it is my intention to run each "Atom Type" as a separate "In Memory Database" and use lots of threads/processors in my new brain.  The new brain could end up being a lot faster than the old one.  This means SQL server (or something else) will essentially only be used to store the permanent record, so I might kick SQL Server out as soon as the idea matures.

Gotta run, wife calls.

And yes, it does run under Windows and runs very quickly with large databases.

It sounds like you've taken something from OpenCog. I think I would use a bit more structure, with other tables for data of different atom types. On the other hand, I have a button that says "Why make something simple and elegant when you can make it complex and wonderful!" (Seriously, I generally try for simple code, but during the middle part of a project when creature feep it at its maximum things get complex before I can fold them nicely and refactor the code.)

I look forward to seeing this.

You raise an important issue.  I definitely understand the concern about tables and structure.  As a software architect for many years, I have often wrestled with these issues.  

If you take a situation such as this, with 30+ atom types already, and I'm thinking of more almost every day.  And then you think about associating one type of atom to another, and there are many different types of associations, and these are also likely to grow as people think of more and more.  Then you imagine 100 different robot makers using it and wanting to "override" default behavior to their own. The number of tables and code to write becomes exponential and yet is still rigid and difficult to maintain.  "Not Versatile" as I think Dan is referring to, and certainly beyond what 1 person can code that types as slow as I do.

Our brains are self organizing, self indexing, free associating kinds of things.  A new idea gets connected with a memory, a word, a goal, a plan, an action, whatever.  I cringe at what a table looks like to build such a thing (several of the tables for some of Androids core apps are similar), but then I revel in how I imagine the object model will work, and the versatility that will be inherent.  I also revel in how I can build a few web pages that can maintain an entire brain with a myriad of different things going on, and one day set others loose to expand that brain.

I had an experience on a consulting gig years ago for a fortune 500.  I was the first person hired on as the "Software Architect" for a series of projects to follow.  The client was about to build around 300-400 user interface "windows" organized into various apps, all with the same look and feel.  I spent a couple days in a hotel room before emerging on a monday to present to them the idea "I am going to put around 12-15 tables in your database and I am going to build you ONE form that will become ANY form on the fly with no user interface programming.  Your designers can create any form, using a form that was itself created by that form."  Suffice it to say, the project was completed on-time, way way under budget, with a fairly non-existent bug list.  I made it a point to put my feet up on my cubicle desk most of the day we went live to show how confident I was that this was going to be a boring day.  It was really cool that you could redesign a app that was being used by hundreds of people, while they were actually using it, with no crashes.  Not advisable from a training standpoint, but amazing that it could be done.  The moral of the story was, sometimes it pays to spot the underlying generic pattern and go for it.

Versatility, and very few moving parts, can go a long way.  Then again, some things feel like a good idea until they don't.  The loss of db enforced referential integrity for this atomspace definitely sucks...I never would have done that for a company.  Not being able to use SQL as effectively is really scary.

I wish I could learn to express myself in shorter posts.

I was thinking about robotic cognition the other day.

One of the things that marks human cognition is remembering similar situations. For a concrete example think about a robot getting caught in a bathroom. To a robot, there are probably many clues that a room is a bathroom: a small room (mostly: my mom-in-law had a master bath room + sewing room), towels hanging, tile, books to be read, etc. For a robot of any size, maneuvering in my bathroom would be a pain. In Lee's bathroom there are privacy concerns and perhaps a wheelchair in the room.

It would be nice if the robot could somehow generalize the concept of a bathroom and the special rules regarding it: privacy, and the problems maneuvering.

Without programming the robot for the specific case of bathrooms, I don't see how a learning robot can associate all of these things together.

I would expect to have to tell the robot about the privacy concerns, but how to generalize this to "don't store images or post the, on the net" I have no idea.

I'm sorry, I'm sure there is a thought in this mess somewhere, but I suspect my pain killers have kicked in and my thoughts are heading towards la-la-land. I'll write more on this later when I can keep rub two thoughts together to make a discussion.

Have a nice day.

A Google or Amazon Vision Service is badly needed...

Input:  An Image

Output:  An array of objects in the image, and their projected positions in 2D (in the image) and 3D space (relative to camera).

If this service existed, then we might be able to do the stuff you are talking about.

Anna already knows things like "A kitchen has a faucet", "A bathroom has a toilet" so mapping words like "faucet" to whatever words google gave to things should be doable if the service existed.  Infering location should then be possible.

I would think if it isn't there already in the new Amazon Phone or Google's Project Tango or others, that there are teams somewhere working on this right now.  It is just too logical and too important not to do.

There was a Kickstarter that was trying to produce this service, but I think they didn't get funded.

It doesn't have to be from Amazon or Google. It could be from one of us if we had the knowledge to do this.

I could attempt it. In fact I'll have to attempt something like this if Groucho is to get to where I want him to be.

The trouble is that vision is very much like NLP in that it involves some really deep brain structures and we only know the basic ones. I know that I can do it, but the algorithm I'd have to use is slow, and it wouldn't work in all cases.

1. Break the image down into concrete objects.

2. Kill the background from each object, such that each object exists in an image all its own.

3. For each image, compare to a database of generic objects in various rotations.

It's step 3 that is so slow. Comparing images is slow when you're not expecting an exact match. I'll have to check on this. Maybe there has been progress. I'll do some checking around.

I know that somebody is doing a database of images from the Kinect sensor, but what about other sensors that are more diffuse such as the various ultrasonic sensors available?

I can picture identifying a room by its sonar fingerprint of s room, but this is something that has to be up to the individual Builder. Each robot will have its own set of sensors.

Just some thoughts.

I heard about the Kinect data from another member.  Interesting.  I wouldn't know how to do the math on the pattern matching for that.  It could be very useful for human gesture / stickfigure recognition, which I wrote a little about elsewhere in this forum.

I hear there is some new thermopile camera attachment that hooks up to a phone for about $300-$400.  A db of images and pattern matching based on heat signatures could be useful.  Once again, the math is beyond me right now.  Anna's thermopile is only 16x4 pixels, probably not enough to do anything all that great.

I have long wanted some sort of visual landmark memory.  One that might be simple enough for me to make work...I speculated that if you pointed a robot due west, and recorded the rarest color every say 5 degrees, swinging around to the north and around to due east, adding each "rarest color" into a color "barcode" of sorts.  (Instead of rarest color, the change in color (up or down) might work better so as to remove the significance of the actual color number in HSV color space.)  You could do it south as well, or in a full 360.  Then, if you could pattern match this "barcode" against a memory of barcode scans from around a house.  This might be able to tell you where you are in a house.  If there were a couple of rarer patterns then in the barcode, the bearing could be found to those, and a position could be estimated within the room.   This is just a theory.  Its really localization and not a general purpose visual memory.  There are some OpenCV techniques that are probably more sound, I was just trying to come up with something new.  

I am doubtful on using sonars for any kind of pattern matching, although I do get a bit of some of the localization techniques based on them.  At the least, my thought would be to allow the developer to "Add Sensors and Services" to their bot using the website.  This would include sonars and algorithms, specifying the orientation of the sensor.  Maybe a set of algorithms (like Force Field) could be present that could be turned on and configured for the specifics of the given bot.

On a related "pattern matching" front, I am still trying to wrap my head around what a hypergraph really is and how I might pattern match hypergraphs.  Is that is even a valid concept?  I'd hate to build something that it conceptually similar to hypergraphs without realizing it, and not be able to benefit from the mathematical work that has been done on them.  Also, how to pattern match natural language parses?  So much of this is new to me.  You implied somewhere that you understood the math behind some of these things.  I don't yet.  Calc and Stats was as far as I got.