The Qualitative Learner of Actions and Perception (QLAP)


Printer friendly (pdf) version of this overview


This video about QLAP was accepted to the 2010 AAAI video competition.
It won the award for Best Educational Video!

An agent, human or otherwise, receives a large sensory stream from the continuous world that must be broken up into useful features. The agent must also learn to use its low-level effectors to bring about desired changes in the world. Humans and other animals have adapted to their environment through a combination of evolution and individual learning. We blur the distinction between individual and species learning and define the problem abstractly as how can an agent from low-level sensors and effectors learn high-level states and actions through autonomous experience with the environment.

Pierce and Kuipers [1997] have shown that an agent can learn the structure of its sensory and motor apparatus. Building on this work, Modayil and Kuipers [2004] have shown how an agent can individuate and track objects in its sensory stream. Our approach builds on this work to enable an agent to learn a discrete sensory description and a hierarchical set of actions. We call our approach the Qualitative Learner of Action and Perception, QLAP.

QLAP learns a discretization of the environment and predictive models of the dynamics of the environment as shown in Figure 1. QLAP assumes that the sensory stream (Fig. 1-a) is converted (Fig. 1-b) to a set of continuous variables. These variables give the locations of objects and distances between them. To build models of the environment, QLAP must learn the necessary discretization. QLAP begins with a very simple discretization (Fig. 1-c), that essentially can only give the direction of movement of objects and if a distance between objects is increasing or decreasing. From this low-resolution representation, QLAP learns (Fig. 1-d) a set of primitive models to describe the dynamics of the environment. These initial models are simple and unreliable, but they make predictions about changes in the environment that can be used to generate a supervised learning signal. This learning signal points QLAP towards new discretizations that can make the models more reliable (Fig. 1-e). Through this synergy of discretization and model building, QLAP builds an increasingly sophisticated representation of the environment.

Figure 1: Perception in QLAP





The process of learning actions is shown in Figure 2. Since the models predict the dynamics of the environment, they can be converted to plans to bring about the predicted effects (Fig. 2-a). Each plan then serves as a different way to perform an action (Fig. 2-b). These plans are then put together to form a hierarchy of actions (Fig. 2-c).

Figure 2: Actions in QLAP




Throughout this process, the agent is learning autonomously. The agent initially motor babbles making random movements. After learning its first models and actions, it can choose actions to ``practice.'' It chooses these actions using Intelligent Adaptive Curiosity, which causes the agent to choose actions that it is getting better at performing. Once the agent has mastered an action, it moves onto another action. Since the hierarchy of actions is continually expanding, actions that were initially called for practice are later called as subactions of higher actions. During this process, the agent learns many models. Those models that do not make sufficiently deterministic predictions are discarded, and those models that do make sufficiently deterministic predictions are converted to plans. Plans that lead to successful completion of the action are more likely to be used by the agent. In this way, the agent adapts to the environment in a developmental progression.

QLAP contributes to the fields of reinforcement learning (RL) and developmental robotics. In RL, a constant challenge is how best to accommodate continuous states and actions. QLAP provides a method for discretizing the state space so that the discretization corresponds to the ``natural joints'' in the environment. Hierarchy construction is an active area of research in RL, and QLAP creates a hierarchical set of actions from continuous motor variables. Additionally, QLAP autonomously creates reinforcement learning problems as part of its developmental progression. This developmental progression is the main contribution of QLAP to developmental robotics. Additionally, this developmental progression provides a test bed for exploring ideas in developmental robotics. For example, we have shown that giving extra emphasis to rare events aids in the autonomous learning of the action to pick up a block using a magnetic hand.


                 
We work in simulation to isolate
the problem of development.
Here is a picture of the
simulated robot.

  1. Initially QLAP motor babbles in its environment as shown here (4 MB). The two floating objects can be sensed by the robot and are added to make the environment more realistic.
  2. As QLAP explores, its learning becomes more directed as shown here (4 MB) and here (4 MB). These videos show the agent exploring its environment after 3.33 hours of experience. It has autonomously learned actions to manipulate the block, and it interacts with the block because this is what it finds interesting at this point in its development. In the first video the block is replaced when it is "picked up." In the second video, the block causes friction with the table, making it hard to move the hand.
  3. After learning, QLAP can be given a task such as "grabbing" the block. In this video (4 MB) we see the agent attempting to perform that task after learning. The block is replaced when it is grabbed.

Details can be found in the publications.