- -

New prioritized value iteration for Markov decision processes

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

New prioritized value iteration for Markov decision processes

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author García Hernández, Ma de Guadalupe es_ES
dc.contributor.author Ruiz Pinales, José es_ES
dc.contributor.author Onaindia de la Rivaherrera, Eva es_ES
dc.contributor.author Aviña Cervantes, J. Gabriel es_ES
dc.contributor.author Ledesma Orozco, Sergio es_ES
dc.contributor.author Alvarado Mendez, Edgar es_ES
dc.contributor.author Reyes Ballesteros, Alberto es_ES
dc.date.accessioned 2014-01-07T10:48:29Z
dc.date.issued 2012-02
dc.identifier.issn 0269-2821
dc.identifier.uri http://hdl.handle.net/10251/34793
dc.description.abstract The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. © Springer Science+Business Media B.V. 2011. es_ES
dc.format.extent 11 es_ES
dc.language Inglés es_ES
dc.publisher Springer Verlag es_ES
dc.relation.ispartof Artificial Intelligence Review es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Dijkstra's algorithm es_ES
dc.subject Markov decision processes es_ES
dc.subject Prioritized value iteration es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title New prioritized value iteration for Markov decision processes es_ES
dc.type Artículo es_ES
dc.embargo.lift 10000-01-01
dc.embargo.terms forever es_ES
dc.identifier.doi 10.1007/s10462-011-9224-z
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Aviña Cervantes, JG.; Ledesma Orozco, S.; Alvarado Mendez, E.; Reyes Ballesteros, A. (2012). New prioritized value iteration for Markov decision processes. Artificial Intelligence Review. 37(2):157-167. doi:10.1007/s10462-011-9224-z es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://link.springer.com/article/10.1007%2Fs10462-011-9224-z es_ES
dc.description.upvformatpinicio 157 es_ES
dc.description.upvformatpfin 167 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 37 es_ES
dc.description.issue 2 es_ES
dc.relation.senia 191953
dc.description.references Agrawal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision. Copenhagen, Denmark, pp 1–15 es_ES
dc.description.references Bellman RE (1954) The theory of dynamic programming. Bull Amer Math Soc 60: 503–516 es_ES
dc.description.references Bellman RE (1957) Dynamic programming. Princeton University Press, New Jersey es_ES
dc.description.references Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Massachusetts es_ES
dc.description.references Bhuma K, Goldsmith J (2003) Bidirectional LAO* algorithm. In: Proceedings of indian international conferences on artificial intelligence. p 980–992 es_ES
dc.description.references Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36: 226–235 es_ES
dc.description.references Bonet B, Geffner H (2003a) Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, Acapulco, México, pp 1233–1238 es_ES
dc.description.references Bonet B, Geffner H (2003b) Labeled RTDP: improving the convergence of real-time dynamic programming. In: Proceedings of the international conference on automated planning and scheduling. Trento, Italy, pp 12–21 es_ES
dc.description.references Bonet B, Geffner H (2006) Learning depth-first search: a unified approach to heuristic search in deterministic and non-deterministic settings and its application to MDP. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK es_ES
dc.description.references Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11: 1–94 es_ES
dc.description.references Chang I, Soo H (2007) Simulation-based algorithms for Markov decision processes Communications and control engineering. Springer, London es_ES
dc.description.references Dai P, Goldsmith J (2007a) Faster dynamic programming for Markov decision processes. Technical report. Doctoral consortium, department of computer science and engineering. University of Washington es_ES
dc.description.references Dai P, Goldsmith J (2007b) Topological value iteration algorithm for Markov decision processes. In: Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1860–1865 es_ES
dc.description.references Dai P, Hansen EA (2007c) Prioritizing bellman backups without a priority queue. In: Proceedings of the 17th international conference on automated planning and scheduling, association for the advancement of artificial intelligence. Rhode Island, USA, pp 113–119 es_ES
dc.description.references Dibangoye JS, Chaib-draa B, Mouaddib A (2008) A Novel prioritization technique for solving Markov decision processes. In: Proceedings of the 21st international FLAIRS (The Florida Artificial Intelligence Research Society) conference, association for the advancement of artificial intelligence. Florida, USA es_ES
dc.description.references Ferguson D, Stentz A (2004) Focused propagation of MDPs for path planning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. pp 310–317 es_ES
dc.description.references Hansen EA, Zilberstein S (2001) LAO: a heuristic search algorithm that finds solutions with loops. Artif Intell 129: 35–62 es_ES
dc.description.references Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57: 1–19 es_ES
dc.description.references Li L (2009) A unifying framework for computational reinforcement learning theory. PhD Thesis. The state university of New Jersey, New Brunswick. NJ es_ES
dc.description.references Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems.In: Proceedings of the 11th international conference on uncertainty in artificial intelligence. Montreal, Quebec pp 394–402 es_ES
dc.description.references McMahan HB, Gordon G (2005a) Fast exact planning in Markov decision processes. In: Proceedings of the 15th international conference on automated planning and scheduling. Monterey, CA, USA es_ES
dc.description.references McMahan HB, Gordon G (2005b) Generalizing Dijkstra’s algorithm and gaussian elimination for solving MDPs. Technical report, Carnegie Mellon University, Pittsburgh es_ES
dc.description.references Meuleau N, Brafman R, Benazera E (2006) Stochastic over-subscription planning using hierarchies of MDPs. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK, pp 121–130 es_ES
dc.description.references Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13: 103–130 es_ES
dc.description.references Puterman ML (1994) Markov decision processes. Wiley Editors, New York es_ES
dc.description.references Puterman ML (2005) Markov decision processes. Wiley Inter Science Editors, New York es_ES
dc.description.references Russell S (2005) Artificial intelligence: a modern approach. Making complex decisions (Ch-17), 2nd edn. Pearson Prentice Hill Ed., USA es_ES
dc.description.references Shani G, Brafman R, Shimony S (2008) Prioritizing point-based POMDP solvers. IEEE Trans Syst Man Cybern 38(6): 1592–1605 es_ES
dc.description.references Sniedovich M (2006) Dijkstra’s algorithm revisited: the dynamic programming connexion. Control Cybern 35: 599–620 es_ES
dc.description.references Sniedovich M (2010) Dynamic programming: foundations and principles, 2nd edn. Pure and Applied Mathematics Series, UK es_ES
dc.description.references Tijms HC (2003) A first course in stochastic models. Discrete-time Markov decision processes (Ch-6). Wiley Editors, UK es_ES
dc.description.references Vanderbei RJ (1996) Optimal sailing strategies. Statistics and operations research program, University of Princeton, USA ( http://www.orfe.princeton.edu/~rvdb/sail/sail.html ) es_ES
dc.description.references Vanderbei RJ (2008) Linear programming: foundations and extensions, 3rd edn. Springer, New York es_ES
dc.description.references Wingate D, Seppi KD (2005) Prioritization methods for accelerating MDP solvers. J Mach Learn Res 6: 851–881 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem