Bellman equation.doc

上传人：仙*** IP属地：河南上传时间：2020-04-13 格式：DOC 页数：15 大小：237.50KB 积分：15 举报 版权申诉

已阅读5页，还剩10页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Bellman equationFrom Wikipedia, the free encyclopediaABellman equation, also known as adynamic programming equation, named after its discoverer,Richard Bellman, is anecessary conditionfor optimality associated with the mathematicaloptimizationmethod known asdynamic programming. It writes the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into simpler subproblems, as BellmansPrinciple of Optimalityprescribes.The Bellman equation was first applied to engineeringcontrol theoryand to other topics in applied mathematics, and subsequently became an important tool ineconomic theory.Almost any problem which can be solved usingoptimal control theorycan also be solved by analyzing the appropriate Bellman equation. However, the term Bellman equation usually refers to the dynamic programming equation associated withdiscrete-timeoptimization problems. In continuous-time optimization problems, the analogous equation is apartial differential equationwhich is usually called theHamiltonJacobiBellman equation.贝尔曼方程从维基百科，自由的百科全书也被称为一个动态规划方程，它的发现者，理查德贝尔曼的名字命名的，是一个Bellman方程，最优的数学优化方法被称为asdynamic编程的必要条件。它在某一个时间点的值写入决策问题的回报，从最初的选择，余下的决策问题的价值，从这些最初的选择的结果。这打破了一个动态的优化问题转化为简单的子问题， Bellman的最优性原则的规定。Bellman方程适用于工程控制论和应用数学中的其他主题，后来成为在经济理论中的一个重要工具。几乎任何利用最优控制理论可以解决也可以解决的问题，通过分析相应的Bellman方程。然而，术语“ Bellman方程”通常是指与离散时间的优化问题相关联的动态规划方程。在连续时间的优化问题，类似的方程是一个通常被称为偏微分方程的Hamilton-Jacobi - Bellman方程。Contentshide 1Analytical concepts in dynamic programming 2Deriving the Bellman equationo 2.1A dynamic decision problemo 2.2Bellmans Principle of Optimalityo 2.3The Bellman equationo 2.4The Bellman equation in a stochastic problem 3Solution methods 4Applications in economics 5Example 6See also 7References内容隐藏动态规划的分析概念2 Bellman方程推导2.1的动态决策问题2.2贝尔曼最优化原理2.3贝尔曼方程 2.4 Bellman方程的随机问题解决方法在经济学中的应用 5例67参考文献editAnalytical concepts in dynamic programmingTo understand the Bellman equation, several underlying concepts must be understood. First, any optimization problem has some objective minimizing travel time, minimizing cost, maximizing profits, maximizing utility, et cetera. The mathematical function that describes this objective is called theobjective function.Dynamic programming breaks a multi-period planning problem into simpler steps at different points in time. Therefore, it requires keeping track of how the decision situation is evolving over time. The information about the current situation which is needed to make a correct decision is called thestate(See Bellman, 1957, Ch. III.2).12For example, to decide how much to consume and spend at each point in time, people would need to know (among other things) their initial wealth. Therefore, wealth would be one of theirstate variables, but there would probably be others.The variables chosen at any given point in time are often called thecontrol variables. For example, given their current wealth, people might decide how much to consume now. Choosing the control variables now may be equivalent to choosing the next state; more generally, the next state is affected by other factors in addition to the current control. For example, in the simplest case, todays wealth (the state) and consumption (the control) might exactly determine tomorrows wealth (the new state), though typically other factors will affect tomorrows wealth too.The dynamic programming approach describes the optimal plan by finding a rule that tells what the controls should be, given any possible value of the state. For example, if consumption (c) dependsonlyon wealth (W), we would seek a rulethat gives consumption as a function of wealth. Such a rule, determining the controls as a function of the states, is called apolicy function(See Bellman, 1957, Ch. III.2).1Finally, by definition, the optimal decision rule is the one that achieves the best possible value of the objective. For example, if someone chooses consumption, given wealth, in order to maximize happiness (assuming happinessHcan be represented by a mathematical function, such as autilityfunction), then each level of wealth will be associated with some highest possible level of happiness,. The best possible value of the objective, written as a function of the state, is called thevalue function.Richard Bellmanshowed that a dynamicoptimizationproblem indiscrete timecan be stated in arecursive, step-by-step form by writing down the relationship between the value function in one period and the value function in the next period. The relationship between these two value functions is called theBellman equation.动态规划分析概念要了解Bellman方程，必须了解几个基本概念。首先，任何优化问题有一定的目标 - 尽量减少出行时间，成本最小化，利润最大化，效用最大化，等等。的数学函数，该函数描述了这一目标被称为目标函数。动态规划打破了多期规划问题转化为简单的步骤在不同的时间点。因此，它需要保持跟踪的决定的情况如何随着时间的推移不断变化的。关于当前形势需要做出一个正确的决定被称为状态的信息（见行李员， 1957年，章III.2）。 1 2例如，决定消费多少，花费在每个点的时候，人们需要知道（其中包括）其初始财富。因此，财富将是他们的状态变量之一，但也有可能是其他。在任何给定时间点所选择的变量，通常被称为控制变量。例如，鉴于其目前的财富，人们可能会决定消耗多少。现在选择的控制变量可以是等同的选择下一个状态;更一般地，下一个状态由除了到电流控制的其他因素的影响。例如，在最简单的情况下，今天的财富（州）和消费（控制）可能准确地确定明天的财富（新州），但通常其他的因素会影响明天的财富。动态规划方法描述的最优计划的发现告诉我们的控制是应该的，因为任何可能的状态值的规则。例如，如果只依赖于财富（W）消费（C），我们将寻求一个规则，让消费财富的作用。这样的规则，确定控件的状态的函数，称为策略函数（见行李员，1957年。 III.2 ）。 1最后，通过定义，最优决策规则是一个达到最好的价值目标。例如，如果有人选择消费，财富，以最大限度地幸福（幸福H可以通过一个数学函数，如效用函数），然后每一级的财富将与一些最高级别的幸福。写的函数的状态，最好的价值目标，被称为价值功能。理查德贝尔曼表明，在离散时间动态优化问题，可以说在一个递归的，一步一步写下之间的关系在一个时期内的价值功能和价值功能在未来的一段。这两个值函数之间的关系被称为Bellman方程。editDeriving the Bellman equationeditA dynamic decision problemLet the state at timebe. For a decision that begins at time 0, we take as given the initial state. At any time, the set of possible actions depends on the current state; we can write this as, where the actionrepresents one or more control variables. We also assume that the state changes fromto a new statewhen actionis taken, and that the current payoff from taking actionin stateis. Finally, we assume impatience, represented by adiscount factor.Under these assumptions, an infinite-horizon decision problem takes the following form:动态决策问题让我们在时间的状态。对于一个决定，从0时开始，我们采取给定的初始状态。在任何时间，可能的动作集依赖于当前的状态，我们可以写为，其中的动作代表中动态决策问题让我们在时间的状态。对于一个决定，从0时开始，我们采取给定的初始状态。在任何时间，可能的动作集依赖于当前的状态，我们可以写为，其中的动作代表中subject to the constraints受约束Notice that we have defined notationto represent the optimal value that can be obtained by maximizing this objective function subject to the assumed constraints. This function is thevalue function. It is a function of the initial state variable, since the best value obtainable depends on the initial situation.注意，我们已经定义的符号来表示的最佳值，该值可以通过以下方式获得最大化该目标函数受假设的约束。此功能的价值功能。它是一个函数的初始状态变量，正弦editBellmans Principle of OptimalityThe dynamic programming method breaks this decision problem into smaller subproblems. Richard BellmansPrinciple of Optimalitydescribes how to do this:Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. (See Bellman, 1957, Chap. III.3.)123贝尔曼最优化原理成更小的子问题的动态规划方法打破了这种决策问题。理查德BellmansPrinciple的最优介绍了如何做到这一点：最优化原理的最优策略都有其特性， WHIn computer science, a problem that can be broken apart like this is said to haveoptimal substructure. In the context of dynamicgame theory, this principle is analogous to the concept ofsubgame perfect equilibrium, although what constitutes an optimal policy in this case is conditioned on the decision-makers opponents choosing similarly optimal policies from their points of view.As suggested by the Principle of Optimality, we will consider the first decision separately, setting aside all future decisions (we will start afresh from time 1 with the new state). Collecting the future decisions in brackets on the right, the previous problem is equivalent to:在计算机科学中，是说，有这样的一个问题，可以分开最优子。在动态博弈论的背景下，这一原则是类似的子博弈完美均衡的概念，但什么是最佳的psubject to the constraints受约束Here we are choosing, knowing that our choice will cause the time 1 state to be. That new state will then affect the decision problem from time 1 on. The whole future decision problem appears inside the square brackets on the right.在这里，我们选择，知道我们的选择会导致时间1状态。这种新的状态将影响从时间上的决策问题。整个未来的决策问题会出现在右边的方括号内。editThe Bellman equationSo far it seems we have only made the problem uglier by separating todays decision from future decisions. But we can simplify by noticing that what is inside the square brackets on the right isthe valueof the time 1 decision problem, starting from state.Therefore we can rewrite the problem as arecursivedefinition of the value function:贝尔曼方程到目前为止，这似乎今天的决定，未来的决策分离，我们只能让问题更丑陋。但是我们可以简化注意到，在右边的方括号内的值的时候决定proble, subject to the constraints:受约束：This is the Bellman equation. It can be simplified even further if we drop time subscripts and plug in the value of the next state:这是Bellman方程。它可以进一步简化，如果我们的下一个状态的价值下降时标和插件The Bellman equation is classified as afunctional equation, because solving it means finding the unknown functionV, which is thevalue function. Recall that the value function describes the best possible value of the objective, as a function of the statex. By calculating the value function, we will also find the functiona(x) that describes the optimal action as a function of the state; this is called thepolicy function.Bellman方程被列为一类函数方程，因为解决这意味着寻找未知函数V，它的价值功能。回想一下，值函数描述的目标的最好的可能的值，作为一个功能的状态editThe Bellman equation in a stochastic problemSee also:Markov Decision ProcessDynamic programming can be especially useful instochasticdecisions, that is, optimization problems affected by random events. For example, consider a problem exactly like the one discussed above, except thatis a random variable, which may be influenced byand, but is not determined by them exactly. We can describe this case by defining theprobability distributionconditional onand, for example,Bellman方程的随机问题另请参阅：马尔可夫决策过程动态规划在随机决定是特别有用的，那就是随机事件影响的最优化问题。例如，考虑一个问题，酷似日Given this probability law determiningconditional onand, the Bellman equation can be written as鉴于这个概率法确定条件， Bellman方程可以写为whererepresents aconditional expectationunder distributionG.根据分布G.代表条件期望editSolution methods解决方法 Themethod of undetermined coefficients, also known as guess and verify, can be used to solve some infinite-horizon,autonomousBellman equations.也被称为“猜测和验证” ，待定系数法，该方法可以用来解决一些无限的地平线，自治区贝尔曼方程。 The Bellman equation can be solved bybackwards induction, eitheranalyticallyin a few special cases, ornumericallyon a computer. Numerical backwards induction is applicable to a wide variety of problems, but may be infeasible when there are many state variables, due to thecurse of dimensionality. Approximate dynamic programming has been introduced byD. P. Bertsekasand J. N. Tsitsiklis with the use ofartificial neural networks(multilayer perceptrons) for approximating the Bellman function.4This is an effective mitigation strategy for reducing the impact of dimensionality by replacing the memorization of the complete function mapping for the whole space domain with the memorization of the sole neural network parameters.贝尔曼方程可以解决的的向后感应，可以分析一些特殊的情况下，的计算机上ornumerically 。数值向后感应是适用于各种各样的问题，但可能是不可行的，当有许多状态 By calculating the first-order conditions associated with the Bellman equation, and then using theenvelope theoremto eliminate the derivatives of the value function, it is possible to obtain a system ofdifference equationsordifferential equationscalled the Euler equations. Standard techniques for the solution of difference or differential equations can then be used to calculate the dynamics of the state variables and the control variables of the optimization problem.Bellman方程与一阶条件，通过计算，然后用包络定理，以消除衍生工具的价值功能，有可能获得系统的差分方程或微分方程约editApplications in economicsThe first known application of a Bellman equation in economics is due to Martin Beckmann and Richard Muth.5Martin Beckmann also wrote extensively on consumption theory using the Bellman equation in 1959. His work influencedEdmund S. Phelps, among others.在经济学中的应用由于马丁贝克曼和理查德穆斯的Bellman方程在经济学中是第一个已知的应用程序。 5马丁贝克曼也写了广泛的使用Bellman方程在1959年的消费理论。他的作品影响了EDMA celebrated economic application of a Bellman equation is Mertons seminal 1973 article on theintertemporal capital asset pricing model.6(See alsoMertons portfolio problem).The solution to Mertons theoretical model, one in which investors chose between income today and future income or capital gains, is a form of Bellmans equation. Because economic applications of dynamic programming usually result in a Bellman equation that is adifference equation, economists refer to dynamic programming as a recursive method.著名的经济适用的Bellman方程是默顿的开创性的1973年文章的跨期资本资产定价模型6的解决方案默顿的理论模型，在其中投资者选择的赌注（默顿的投资组合问题）。Stokey, Lucas & Prescott describe stochastic and nonstochastic dynamic programming in considerable detail, giving many examples of how to employ dynamic programming to solve problems in economic theory.7This book led to dynamic programming being employed to solve a wide range of theoretical problems in economics, including optimaleconomic growth,resource extraction,principalagent problems,public finance, businessinvestment,asset pricing,factorsupply, andindustrial organization. Ljungqvist & Sargent apply dynamic programming to study a variety of theoretical questions inmonetary policy,fiscal policy,taxation,economic growth,search theory, andlabor economics.8Dixit & Pindyck showed the value of the method for thinking aboutcapital budgeting.9Anderson adapted the technique to business valuation, including privately held businesses.10Using dynamic programming to solve concrete problems is complicated by informational difficulties, such as choosing the unobservable discount rate. There are also computational issues, the main one being thecurse of dimensionalityarising from the vast number of possible actions and potential state variables that must be considered before an optimal strategy can be selected. For an extensive discussion of computational issues, see Miranda & Fackler.,11and Meyn 200712随机和非随机动态规划， Stokey ，卢卡斯和普雷斯科特描述得相当详细，让许多例子，如何利用动态规划来解决经济理论中存在的问题。 7 这本书导致了动态规划解决了广泛的理论问题在经济学中，包括最佳的经济增长，资源开采，委托 - 代理问题，公共财政，商业投资，资产定价，要素供给，产业组织。郎奎外斯特和萨金特应用动态规划研究的各种理论问题，在货币政策，财政政策，税收，经济增长，搜寻理论，劳动经济学。 8迪克西特和，平狄克显示的值考虑资本预算的方法。 9安德森适应的技术，业务估值，包括私营企业。 10采用动态规划来解决具体的问题是复杂的，信息的困难，如选择的不可观察的折扣率。也有计算的问题，主要的一个是可能采取的行动和潜在的状态变量之前，必须考虑的最优策略，可以选择从大量的维数所产生的诅咒。有关的计算问题进行了广泛讨论，米兰达和法克勒。， 11和Meyn 2007 12editExampleInMDP, a Bellman equation refers to arecursionfor expected rewards. For example, the expected reward for being in a particular statesand following some fixed policyhas the Bellman equation:编辑例子在MDP中，一个的Bellman方程是指预期回报的递归。例如，在一个特定的状态s和一些固定的政策预期回报的贝尔曼方程：This equation describes the expected reward for taking the action prescribed by some policy.The equation for the optimal policy is referred to as theBellman optimality equation:该方程描述了一些政策规定采取行动的预期回报。最优策略的方程被称作的Bellman最优性方程：It describes the reward for taking the action giving the highest expected return.editSee also Dynamic programming HamiltonJacobiBellman equation Markov decision process Optimal control theory Optimal substructure Recursive competitive equilibrium Bellman pseudospectral method它描述了采取行动的报酬，给予最高的预期收益率。编辑参见动态编程汉密尔顿 - 雅可比 - 贝尔曼方程马尔可夫决策过程最优控制理论最优子递归的竞争性均衡贝尔曼伪谱法editReferences1. abcBellman, R.E. 1957.Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover,ISBN 0-486-42809-5.2. abS. Dreyfus (2002),Richard Bellman on the birth of dynamic programmingOperations Research50 (1), pp. 4851.3. R Bellman,On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 19524. Bertsekas, D. P., Tsitsiklis, J.

人人文库> 全部分类> 教育资料 > 课设设计

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

Bellman equation.doc

文档简介

温馨提示

最新文档

评论

Bellman equation.doc

文档简介

温馨提示

最新文档

评论

相关文档