This paper reports on learning in autonomous Lego robots programmed to play soccer. The specific focus here is upon how different behaviors can be learned by two identical robots, based on the different type of feedback they receive from the environment. After being constructed with the same design and the same software, two Lego robots are able to perform the same basic soccer actions, such as moving to the ball, determining where the goal is. and kicking the ball. A non-deterministic version of the Q-step learning algorithm is then used to allow a robot to learn a correct series of actions based on feedback (rewards and punishment) from the environment. The goal of Q-step learning is for a robot to maximize the environmental rewards. Thus, a series of actions is "correct" if it leads to a large cumulative reward. Different environmental feedback can thus lead to a different "correct" series of actions. For instance, in this project, one of the robots learned to be an offensive player and the other robot learned to be a defensive player. This paper discusses the construction of the robots. their basic actions, the learning, training, and interactions with each other.
McGehee, Jonathan, "Interactive Lego Robots Learning in a Controlled Environment" (2003). Honors College Capstone Experience/Thesis Projects. Paper 26.