Dynamic update of the reinforcement function during learning

Santos, J.M.; Touzet, C.

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Artículo

Santos, J.M.; Touzet, C. "Dynamic update of the reinforcement function during learning" (1999) Connection Science. 11(3-4):267-288

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos

Estamos trabajando para incorporar este artículo al repositorio

Consulte la política de Acceso Abierto del editor

Abstract:

During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most difficult task today is to obtain the definition of the reinforcement function (RF). A first attempt in this direction was made by introducing a method - the update parameters algorithm (UPA) - for tuning a RF in such a way that it would be optimal during the exploration phase. The only requirement is to conform to a particular expression of RF. In this article, we propose Dynamic-UPA, an algorithm able to tune the RF parameters during the whole learning phase (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera in tasks of synthesis of obstacle avoidance and wall-following behaviors validate our proposals.

Registro:

Documento:	Artículo
Título:	Dynamic update of the reinforcement function during learning
Autor:	Santos, J.M.; Touzet, C.
Filiación:	Departamento de Computación, FCEyN, Universidad de Buenos Aires, 1428 Buenos Aires, Argentina Ciudad Universitaria, Pabellón I, 1428 Buenos Aires, Argentina Ctr. for Eng. Sci. Advanced Research, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6355, United States
Palabras clave:	Autonomous robot; Behaviour-based approach; Reinforcement function; Reinforcement learning; Robot learning
Año:	1999
Volumen:	11
Número:	3-4
Página de inicio:	267
Página de fin:	288
Título revista:	Connection Science
Título revista abreviado:	Connect. Sci.
ISSN:	09540091
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos

Referencias:

Ackley, D., Littman, M., Interactions between learning and evolution (1991) Artificial Life II, SFI Studies in the Sciences of Complexity, 10, pp. 487-509. , C.G. Langton, C. Taylor, J.D. Farmer & S. Rasmussen (Eds). Reading, MA: Addison Wesley
Anderson, C.W., Q-learning with hidden-unit restarting (1993) Advances in Neural Information Processing Systems, 5, pp. 81-88. , San Mateo, CA: Morgan Kaufmann
Baird, L.C., Residual algorithms: Reinforcement learning with function approximation (1995) Machine Learning: Proceedings of the Twelfth International Conference, , A. Prieditis & S. Russell (Eds), San Francisco, CA: Morgan Kaufman
Braitenberg, V., (1987) Vehicles. Experiments in Synthetic Psychology, , Cambridge, MA: Bradford Books, MIT Press
Donnart, J.-Y., Meyer, J.-A., Learning reactive and planning rules in a motivationally autonomous animat (1996) IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics, 22, pp. 381-395
Glorennec, P.Y., Jouffe, L., A reinforcement learning method for an autonomous robot (1996) Proceedings of the First Online Workshop on Soft Computing, , Nagoya University
Godjevac, G., Steele, N., Neuro-fuzzy control of a mobile robot (1998) Proceedings of the Fourth International Conference on Neural Networks and Their Applications, pp. 231-241. , Marseille
Harmon, M.E., (1998) On Line Reinforcement Learning Tutorial, , http://www~anw.cs.umass.edu/~mharmon/rltutorial/tut.html, University of Massachussets
Kretchmar, R.M., Anderson, C.W., Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning (1997) Proceedings of the International Conference on Neural Networks '97, Houston, TX
Lin, L.-J., Self-improving reactive agents based on reinforcement learning, planning and teaching (1992) Machine Learning, 8, pp. 293-322
Lin, L.-J., (1993) Reinforcement Learning for Robots Using Neural Networks, , PhD thesis, Carnegie-Mellon University
Mahadevan, S., Connell, J., Automatic programming of behavior-based robots using reinforcement learning (1992) Artificial Intelligence, 55, pp. 311-365
Martín, P., Millán, J.D.R., (1996) Learning Goal-oriented Obstacle-avoiding Strategies Through Reinforcement for a Two-link Sensor-based Manipulator, , Technical Note No. I.96.138, Joint Research Centre, Ispra
Martín, P., Millán, J.D.R., (1997) Combining Reinforcement Learning and Differential Inverse Kinematics for Collision-free Motion of Multilink Manipulators, LNCS 1240, 1240, pp. 1324-1333. , Berlin: Springer
Mataric, M.J., Reward functions for accelerated learning (1994) Machine Learning: Proceedings of the Eleventh International Conference, pp. 181-189. , W.W. Cohen & H. Hirsch (Eds), San Francisco, CA: Morgan Kaufmann
Mataric, M.J., Learning to behave socially (1994) Proceedings of the Third International Conference on Simulation of Adaptive Behavior (SAB-94): From Animals to Animals 3, 3, pp. 453-462. , D. Cliff, P. Husbands, J.-A. Meyer & S. Wilson (Eds) Cambridge, MA: MIT Press
Millán, J.D.R., Rapid, safe and incremental learning of navigation strategies (1996) IEEE Transactions on Systems, Man and Cybernetics - Part B, 26, pp. 408-420
Mondada, F., Franzi, E., Ienne, P., Mobile robot miniaturisation: A tool for investigation in control algorithms (1993) Proceedings of the Third International Symposium on Experimental Robotics, , Kyoto
Moody, J., Darken, C.J., Fast learning in networks of locally-tuned processing units (1989) Neural Computation, 1, pp. 281-294
Santos, J.M., (1999) Contributions to the Study and the Design of Reinforcement Functions, , PhD thesis, Universidad ds Buenos Aires, Universite d'Aix Marseille-III
Santos, J.M., Touzet, C., Exploration tuned reinforcement function (1999) Neurocomputing, 28 (1-3), pp. 93-105
Schmidhuber, J., Zhao, J., Schraudolph, N.N., Reinforcement learning with self-modifying policies (1997) Learning to Learn, pp. 293-309. , S. Thrun & L. Pratt (Eds), Dordrecht: Kluwer
Sehad, S., Touzet, C., Reinforcement learning and neural reinforcement learning (1994) Proceedings of the European Symposium on Artificial Neural Networks, pp. 135-140. , Brussels, April
Sutton, R.S., Learning to predict by the methods of temporal differences (1988) Machine Learning, 3, pp. 9-44
Tesauro, G., Practical issues in temporal difference learning (1992) Machine Learning, 8, pp. 257-277
Tham, C.K., Prager, R.W., A modular q-learning architecture for manipulator task decomposition (1994) Proceedings of the Eleventh Conference on Machine Learning, New Brunswick, pp. 309-317
Tham, C.K., Prager, R.W., Reinforcement learning methods for multi-linked manipulator obstacle avoidance and control (1993) Proceedings of IEEE Asia Pacific Workshop on Advances in Motion Control, Singapore
Thrun, S., Exploration in active learning (1996) Handbook of Brain and Neural Networks, , M. Arbib (Ed.)
Thrun, S.B., (1992) Efficient Exploration in Reinforcement Learning, , Technical Report CMU-CS-92-102, Carnegie-Mellon University
Touzet, C., Extending immediate reinforcement learning on neural networks to multiple actions (1994) Proceedings of the European Symposium on Artificial Neural Networks, pp. 153-159. , Brussels, April
Touzet, C., Neural reinforcement learning for behaviour synthesis (1997) Robotics and Autonomous Systems, 22, pp. 251-281
Touzet, C., Sen, S., Learning agents (1997) First Conference on Autonomous Agents, , (invited paper). Marina del Rey, February
Watkins, C.J.C.H., (1989) Learning from Delayed Rewards, , PhD thesis, King's College, University Library Cambridge
Watkins, C.J.C.H., Dayan, P., Q-learning (1992) Machine Learning, 8, pp. 279-292. , Technical note
Zhang, P., Canu, S., Indirect adaptive exploration in entropy based reinforcement learning (1995) Proceedings of the International Conference on Artificial Neural Networks, Perth, W.A., pp. 354-359

Citas:

---------- APA ----------

Santos, J.M. & Touzet, C. (1999) . Dynamic update of the reinforcement function during learning. Connection Science, 11(3-4), 267-288.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]

---------- CHICAGO ----------

Santos, J.M., Touzet, C. "Dynamic update of the reinforcement function during learning" . Connection Science 11, no. 3-4 (1999) : 267-288.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]

---------- MLA ----------

Santos, J.M., Touzet, C. "Dynamic update of the reinforcement function during learning" . Connection Science, vol. 11, no. 3-4, 1999, pp. 267-288.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]

---------- VANCOUVER ----------

Santos, J.M., Touzet, C. Dynamic update of the reinforcement function during learning. Connect. Sci. 1999;11(3-4):267-288.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos [ ]