Artículo

Estamos trabajando para incorporar este artículo al repositorio
Consulte la política de Acceso Abierto del editor

Abstract:

During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most difficult task today is to obtain the definition of the reinforcement function (RF). A first attempt in this direction was made by introducing a method - the update parameters algorithm (UPA) - for tuning a RF in such a way that it would be optimal during the exploration phase. The only requirement is to conform to a particular expression of RF. In this article, we propose Dynamic-UPA, an algorithm able to tune the RF parameters during the whole learning phase (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera in tasks of synthesis of obstacle avoidance and wall-following behaviors validate our proposals.

Registro:

Documento: Artículo
Título:Dynamic update of the reinforcement function during learning
Autor:Santos, J.M.; Touzet, C.
Filiación:Departamento de Computación, FCEyN, Universidad de Buenos Aires, 1428 Buenos Aires, Argentina
Ciudad Universitaria, Pabellón I, 1428 Buenos Aires, Argentina
Ctr. for Eng. Sci. Advanced Research, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6355, United States
Palabras clave:Autonomous robot; Behaviour-based approach; Reinforcement function; Reinforcement learning; Robot learning
Año:1999
Volumen:11
Número:3-4
Página de inicio:267
Página de fin:288
Título revista:Connection Science
Título revista abreviado:Connect. Sci.
ISSN:09540091
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos

Referencias:

  • Ackley, D., Littman, M., Interactions between learning and evolution (1991) Artificial Life II, SFI Studies in the Sciences of Complexity, 10, pp. 487-509. , C.G. Langton, C. Taylor, J.D. Farmer & S. Rasmussen (Eds). Reading, MA: Addison Wesley
  • Anderson, C.W., Q-learning with hidden-unit restarting (1993) Advances in Neural Information Processing Systems, 5, pp. 81-88. , San Mateo, CA: Morgan Kaufmann
  • Baird, L.C., Residual algorithms: Reinforcement learning with function approximation (1995) Machine Learning: Proceedings of the Twelfth International Conference, , A. Prieditis & S. Russell (Eds), San Francisco, CA: Morgan Kaufman
  • Braitenberg, V., (1987) Vehicles. Experiments in Synthetic Psychology, , Cambridge, MA: Bradford Books, MIT Press
  • Donnart, J.-Y., Meyer, J.-A., Learning reactive and planning rules in a motivationally autonomous animat (1996) IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics, 22, pp. 381-395
  • Glorennec, P.Y., Jouffe, L., A reinforcement learning method for an autonomous robot (1996) Proceedings of the First Online Workshop on Soft Computing, , Nagoya University
  • Godjevac, G., Steele, N., Neuro-fuzzy control of a mobile robot (1998) Proceedings of the Fourth International Conference on Neural Networks and Their Applications, pp. 231-241. , Marseille
  • Harmon, M.E., (1998) On Line Reinforcement Learning Tutorial, , http://www~anw.cs.umass.edu/~mharmon/rltutorial/tut.html, University of Massachussets
  • Kretchmar, R.M., Anderson, C.W., Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning (1997) Proceedings of the International Conference on Neural Networks '97, Houston, TX
  • Lin, L.-J., Self-improving reactive agents based on reinforcement learning, planning and teaching (1992) Machine Learning, 8, pp. 293-322
  • Lin, L.-J., (1993) Reinforcement Learning for Robots Using Neural Networks, , PhD thesis, Carnegie-Mellon University
  • Mahadevan, S., Connell, J., Automatic programming of behavior-based robots using reinforcement learning (1992) Artificial Intelligence, 55, pp. 311-365
  • Martín, P., Millán, J.D.R., (1996) Learning Goal-oriented Obstacle-avoiding Strategies Through Reinforcement for a Two-link Sensor-based Manipulator, , Technical Note No. I.96.138, Joint Research Centre, Ispra
  • Martín, P., Millán, J.D.R., (1997) Combining Reinforcement Learning and Differential Inverse Kinematics for Collision-free Motion of Multilink Manipulators, LNCS 1240, 1240, pp. 1324-1333. , Berlin: Springer
  • Mataric, M.J., Reward functions for accelerated learning (1994) Machine Learning: Proceedings of the Eleventh International Conference, pp. 181-189. , W.W. Cohen & H. Hirsch (Eds), San Francisco, CA: Morgan Kaufmann
  • Mataric, M.J., Learning to behave socially (1994) Proceedings of the Third International Conference on Simulation of Adaptive Behavior (SAB-94): From Animals to Animals 3, 3, pp. 453-462. , D. Cliff, P. Husbands, J.-A. Meyer & S. Wilson (Eds) Cambridge, MA: MIT Press
  • Millán, J.D.R., Rapid, safe and incremental learning of navigation strategies (1996) IEEE Transactions on Systems, Man and Cybernetics - Part B, 26, pp. 408-420
  • Mondada, F., Franzi, E., Ienne, P., Mobile robot miniaturisation: A tool for investigation in control algorithms (1993) Proceedings of the Third International Symposium on Experimental Robotics, , Kyoto
  • Moody, J., Darken, C.J., Fast learning in networks of locally-tuned processing units (1989) Neural Computation, 1, pp. 281-294
  • Santos, J.M., (1999) Contributions to the Study and the Design of Reinforcement Functions, , PhD thesis, Universidad ds Buenos Aires, Universite d'Aix Marseille-III
  • Santos, J.M., Touzet, C., Exploration tuned reinforcement function (1999) Neurocomputing, 28 (1-3), pp. 93-105
  • Schmidhuber, J., Zhao, J., Schraudolph, N.N., Reinforcement learning with self-modifying policies (1997) Learning to Learn, pp. 293-309. , S. Thrun & L. Pratt (Eds), Dordrecht: Kluwer
  • Sehad, S., Touzet, C., Reinforcement learning and neural reinforcement learning (1994) Proceedings of the European Symposium on Artificial Neural Networks, pp. 135-140. , Brussels, April
  • Sutton, R.S., Learning to predict by the methods of temporal differences (1988) Machine Learning, 3, pp. 9-44
  • Tesauro, G., Practical issues in temporal difference learning (1992) Machine Learning, 8, pp. 257-277
  • Tham, C.K., Prager, R.W., A modular q-learning architecture for manipulator task decomposition (1994) Proceedings of the Eleventh Conference on Machine Learning, New Brunswick, pp. 309-317
  • Tham, C.K., Prager, R.W., Reinforcement learning methods for multi-linked manipulator obstacle avoidance and control (1993) Proceedings of IEEE Asia Pacific Workshop on Advances in Motion Control, Singapore
  • Thrun, S., Exploration in active learning (1996) Handbook of Brain and Neural Networks, , M. Arbib (Ed.)
  • Thrun, S.B., (1992) Efficient Exploration in Reinforcement Learning, , Technical Report CMU-CS-92-102, Carnegie-Mellon University
  • Touzet, C., Extending immediate reinforcement learning on neural networks to multiple actions (1994) Proceedings of the European Symposium on Artificial Neural Networks, pp. 153-159. , Brussels, April
  • Touzet, C., Neural reinforcement learning for behaviour synthesis (1997) Robotics and Autonomous Systems, 22, pp. 251-281
  • Touzet, C., Sen, S., Learning agents (1997) First Conference on Autonomous Agents, , (invited paper). Marina del Rey, February
  • Watkins, C.J.C.H., (1989) Learning from Delayed Rewards, , PhD thesis, King's College, University Library Cambridge
  • Watkins, C.J.C.H., Dayan, P., Q-learning (1992) Machine Learning, 8, pp. 279-292. , Technical note
  • Zhang, P., Canu, S., Indirect adaptive exploration in entropy based reinforcement learning (1995) Proceedings of the International Conference on Artificial Neural Networks, Perth, W.A., pp. 354-359

Citas:

---------- APA ----------
Santos, J.M. & Touzet, C. (1999) . Dynamic update of the reinforcement function during learning. Connection Science, 11(3-4), 267-288.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]
---------- CHICAGO ----------
Santos, J.M., Touzet, C. "Dynamic update of the reinforcement function during learning" . Connection Science 11, no. 3-4 (1999) : 267-288.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]
---------- MLA ----------
Santos, J.M., Touzet, C. "Dynamic update of the reinforcement function during learning" . Connection Science, vol. 11, no. 3-4, 1999, pp. 267-288.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]
---------- VANCOUVER ----------
Santos, J.M., Touzet, C. Dynamic update of the reinforcement function during learning. Connect. Sci. 1999;11(3-4):267-288.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]