HBase数据备份.docx_第1页
HBase数据备份.docx_第2页
HBase数据备份.docx_第3页
HBase数据备份.docx_第4页
HBase数据备份.docx_第5页
已阅读5页,还剩22页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

HBase 数据备份的方法1 Full Shutdown Backup1.1 关闭hbase1.2 利用hadoop的distcpDistcp 既用于将HDFS里面的HBase 目录下的内容拷贝到当前集群的另一个目录,也可以拷贝到另一个集群。1.3 Hadoop distcp用法DistCp(分布式拷贝)是用于大规模集群内部和集群之间拷贝的工具。 它使用Map/Reduce实现文件分发,错误处理和恢复,以及报告生成。 它把文件和目录的列表作为map任务的输入,每个任务会完成源列表中部分文件的拷贝。 由于使用了Map/Reduce方法,这个工具在语义和执行上都会有特殊的地方。 这篇文档会为常用DistCp操作提供指南并阐述它的工作模型。 使用方法基本使用方法DistCp最常用在集群之间的拷贝:bash$ hadoop distcp hdfs:/nn1:8020/foo/bar hdfs:/nn2:8020/bar/foo这条命令会把nn1集群的/foo/bar目录下的所有文件或目录名展开并存储到一个临时文件中,这些文件内容的拷贝工作被分配给多个map任务, 然后每个TaskTracker分别执行从nn1到nn2的拷贝操作。注意DistCp使用绝对路径进行操作。命令行中可以指定多个源目录:bash$ hadoop distcp hdfs:/nn1:8020/foo/a hdfs:/nn1:8020/foo/b hdfs:/nn2:8020/bar/foo或者使用-f选项,从文件里获得多个源:bash$ hadoop distcp -f hdfs:/nn1:8020/srclist hdfs:/nn2:8020/bar/foo其中srclist 的内容是hdfs:/nn1:8020/foo/ahdfs:/nn1:8020/foo/b当从多个源拷贝时,如果两个源冲突,DistCp会停止拷贝并提示出错信息, 如果在目的位置发生冲突,会根据选项设置解决。 默认情况会跳过已经存在的目标文件(比如不用源文件做替换操作)。每次操作结束时 都会报告跳过的文件数目,但是如果某些拷贝操作失败了,但在之后的尝试成功了, 那么报告的信息可能不够精确(请参考附录)。每个TaskTracker必须都能够与源端和目的端文件系统进行访问和交互。 对于HDFS来说,源和目的端要运行相同版本的协议或者使用向下兼容的协议。拷贝完成后,建议生成源端和目的端文件的列表,并交叉检查,来确认拷贝真正成功。 因为DistCp使用Map/Reduce和文件系统API进行操作,所以这三者或它们之间有任何问题 都会影响拷贝操作。一些Distcp命令的成功执行可以通过再次执行带-update参数的该命令来完成, 值得注意的是,当另一个客户端同时在向源文件写入时,拷贝很有可能会失败。 尝试覆盖HDFS上正在被写入的文件的操作也会失败。 如果一个源文件在拷贝之前被移动或删除了,拷贝失败同时输出异常 FileNotFoundException。选项选项索引标识 描述 备注-prbugp Preserver: replication numberb: block sizeu: userg: groupp: permission修改次数不会被保留。并且当指定 -update 时,更新的状态不会被同步,除非文件大小不同(比如文件被重新创建)。-i 忽略失败 就像在 附录中提到的,这个选项会比默认情况提供关于拷贝的更精确的统计, 同时它还将保留失败拷贝操作的日志,这些日志信息可以用于调试。最后,如果一个map失败了,但并没完成所有分块任务的尝试,这不会导致整个作业的失败。-log 记录日志到 DistCp为每个文件的每次尝试拷贝操作都记录日志,并把日志作为map的输出。 如果一个map失败了,当重新执行时这个日志不会被保留。-m 同时拷贝的最大数目 指定了拷贝数据时map的数目。请注意并不是map数越多吞吐量越大。-overwrite 覆盖目标 如果一个map失败并且没有使用-i选项,不仅仅那些拷贝失败的文件,这个分块任务中的所有文件都会被重新拷贝。 就像下面提到的,它会改变生成目标路径的语义,所以 用户要小心使用这个选项。-update 如果源和目标的大小不一样则进行覆盖 像之前提到的,这不是同步操作。 执行覆盖的唯一标准是源文件和目标文件大小是否相同;如果不同,则源文件替换目标文件。 像 下面提到的,它也改变生成目标路径的语义, 用户使用要小心。-f 使用 作为源文件列表 这等价于把所有文件名列在命令行中。 urilist_uri 列表应该是完整合法的URI。更新和覆盖这里给出一些 -update和 -overwrite的例子。 考虑一个从/foo/a 和 /foo/b 到 /bar/foo的拷贝,源路径包括:hdfs:/nn1:8020/foo/ahdfs:/nn1:8020/foo/a/aahdfs:/nn1:8020/foo/a/abhdfs:/nn1:8020/foo/bhdfs:/nn1:8020/foo/b/bahdfs:/nn1:8020/foo/b/ab如果没设置-update或 -overwrite选项, 那么两个源都会映射到目标端的 /bar/foo/ab。 如果设置了这两个选项,每个源目录的内容都会和目标目录的 内容 做比较。DistCp碰到这类冲突的情况会终止操作并退出。默认情况下,/bar/foo/a 和 /bar/foo/b 目录都会被创建,所以并不会有冲突。现在考虑一个使用-update合法的操作:distcp -update hdfs:/nn1:8020/foo/a hdfs:/nn1:8020/foo/b hdfs:/nn2:8020/bar其中源路径/大小:hdfs:/nn1:8020/foo/ahdfs:/nn1:8020/foo/a/aa 32hdfs:/nn1:8020/foo/a/ab 32hdfs:/nn1:8020/foo/bhdfs:/nn1:8020/foo/b/ba 64hdfs:/nn1:8020/foo/b/bb 32和目的路径/大小:hdfs:/nn2:8020/barhdfs:/nn2:8020/bar/aa 32hdfs:/nn2:8020/bar/ba 32hdfs:/nn2:8020/bar/bb 64会产生:hdfs:/nn2:8020/barhdfs:/nn2:8020/bar/aa 32hdfs:/nn2:8020/bar/ab 32hdfs:/nn2:8020/bar/ba 64hdfs:/nn2:8020/bar/bb 32只有nn2的aa文件没有被覆盖。如果指定了 -overwrite选项,所有文件都会被覆盖。附录Map数目DistCp会尝试着均分需要拷贝的内容,这样每个map拷贝差不多相等大小的内容。 但因为文件是最小的拷贝粒度,所以配置增加同时拷贝(如map)的数目不一定会增加实际同时拷贝的数目以及总吞吐量。如果没使用-m选项,DistCp会尝试在调度工作时指定map的数目 为 min (total_bytes / bytes.per.map, 20 * num_task_trackers), 其中bytes.per.map默认是256MB。建议对于长时间运行或定期运行的作业,根据源和目标集群大小、拷贝数量大小以及带宽调整map的数目。不同HDFS版本间的拷贝对于不同Hadoop版本间的拷贝,用户应该使用HftpFileSystem。 这是一个只读文件系统,所以DistCp必须运行在目标端集群上(更确切的说是在能够写入目标集群的TaskTracker上)。 源的格式是 hftp:/ (默认情况dfs.http.address是 :50070)。Map/Reduce和副效应像前面提到的,map拷贝输入文件失败时,会带来一些副效应。除非使用了-i,任务产生的日志会被新的尝试替换掉。除非使用了-overwrite,文件被之前的map成功拷贝后当又一次执行拷贝时会被标记为 被忽略。如果map失败了mapred.map.max.attempts次,剩下的map任务会被终止(除非使用了-i)。如果mapred.speculative.execution被设置为 final和true,则拷贝的结果是未定义的。2 Live Cluster Backup ReplicationAs of version 0.92, Apache HBase supports master/master and cyclic replication as well as replication to multiple slaves.2.1 原理2.1.1 Enabling replicationThe guide on enabling and using cluster replication is contained in the API documentation shipped with your Apache HBase distribution.The most up-to-date documentation isavailable at this address.2.1.2 Life of a log editThe following sections describe the life of a single edit going from a client that communicates with a master cluster all the way to a single slave cluster. Normal processingThe client uses an API that sends a Put, Delete or ICV to a region server. The key values are transformed into a WALEdit by the region server and is inspected by the replication code that, for each family that is scoped for replication, adds the scope to the edit. The edit is appended to the current WAL and is then applied to its MemStore.In a separate thread, the edit is read from the log (as part of a batch) and only the KVs that are replicable are kept (that is, that they are part of a family scoped GLOBAL in the familys schema, non-catalog so not .META. or -ROOT-, and did not originate in the target slave cluster - in case of cyclic replication).The edit is then tagged with the masters cluster UUID. When the buffer is filled, or the reader hits the end of the file, the buffer is sent to a random region server on the slave cluster.Synchronously, the region server that receives the edits reads them sequentially and separates each of them into buffers, one per table. Once all edits are read, each buffer is flushed using the normal HBase client (HTables managed by a HTablePool). This is done in order to leverage parallel insertion (MultiPut). The masters cluster UUID is retained in the edits applied at the slave cluster in order to allow cyclic replication.Back in the master clusters region server, the offset for the current WAL thats being replicated is registered in ZooKeeper. Non-responding slave clustersThe edit is inserted in the same way.In the separate thread, the region server reads, filters and buffers the log edits the same way as during normal processing. The slave region server thats contacted doesnt answer to the RPC, so the master region server will sleep and retry up to a configured number of times. If the slave RS still isnt available, the master cluster RS will select a new subset of RS to replicate to and will retry sending the buffer of edits.In the mean time, the WALs will be rolled and stored in a queue in ZooKeeper. Logs that are archived by their region server (archiving is basically moving a log from the region servers logs directory to a central logs archive directory) will update their paths in the in-memory queue of the replicating thread.When the slave cluster is finally available, the buffer will be applied the same way as during normal processing. The master cluster RS will then replicate the backlog of logs.2.1.3 InternalsThis section describes in depth how each of replications internal features operate. Replication Zookeeper StateHBase replication maintains all of its state in Zookeeper. By default, this state is contained in the base znode: /hbase/replication There are three major child znodes in the base replication znode: State znode:/hbase/replication/state Peers znode:/hbase/replication/peers RS znode:/hbase/replication/rs The State znodeThestate znodeindicates whether or not replication is enabled on the cluster corresponding to this zookeeper quorum. It does not have any child znodes and simply contains a boolean value. This value is initialized on startup based on thehbase.replicationconfig parameter in thehbase-site.xmlfile. The status value is read/maintained by theReplicationZookeeper.ReplicationStatusTrackerclass. It is also cached locally using an AtomicBoolean in theReplicationZookeeperclass. This value can be changed on a live cluster using thestop_replicationcommand available through the hbase shell. /hbase/replication/state VALUE: true The Peers znodeThepeers znodecontains a list of all peer replication clusters and the current replication state of those clusters. It has one childpeer znodefor each peer cluster. Thepeer znodeis named with the cluster id provided by the user in the HBase shell. The value of thepeer znodecontains the peers cluster key provided by the user in the HBase Shell. The cluster key contains a list of zookeeper nodes in the clusters quorum, the client port for the zookeeper quorum, and the base znode for HBase (i.e. “,,:2181:/hbase”). /hbase/replication/peers /1 Value: ,,:2181:/hbase /2 Value: ,,:2181:/hbase Each of thesepeer znodeshas a child znode that indicates whether or not replication is enabled on that peer cluster. Thesepeer-state znodesdo not have child znodes and simply contain a boolean value (i.e. ENABLED or DISABLED). This value is read/maintained by theReplicationPeer.PeerStateTrackerclass. It is also cached locally using an AtomicBoolean in theReplicationPeerclass. /hbase/replication/peers /1/peer-state Value: ENABLED /2/peer-state Value: DISABLED The RS znodeThers znodecontains a list of all outstanding HLog files in the cluster that need to be replicated. The list is divided into a set of queues organized by region server and the peer cluster the region server is shipping the HLogs to. Thers znodehas one child znode for each region server in the cluster. The child znode name is simply the regionserver name (a concatenation of the region servers hostname, client port and start code). These region servers could either be dead or alive. /hbase/replication/rs /,6020,1234 /,6020,2856 Within each region server znode, the region server maintains a set of HLog replication queues. Each region server has one queue for every peer cluster it replicates to. These queues are represented by child znodes named using the cluster id of the peer cluster they represent (see the peer znode section). /hbase/replication/rs /,6020,1234 /1 /2 Each queue has one child znode for every HLog that still needs to be replicated. The value of these HLog child znodes is the latest position that has been replicated. This position is updated every time a HLog entry is replicated. /hbase/replication/rs /,6020,1234 /1 23522342.23422 VALUE: 254 12340993.22342 VALUE: 0 2.1.4 Configuration Parameters Zookeeper znode pathsAll of the base znode names are configurable through parameters:ParameterDefault Valuezookeeper.znode.parent/hbasezookeeper.znode.replicationreplicationzookeeper.znode.replication.peerspeerszookeeper.znode.replication.peers.statepeer-statezookeeper.znode.replication.rsrsThe default replication znode structure looks like the following: /hbase/replication/state /hbase/replication/peers/peerId/peer-state /hbase/replication/rs Other parameters hbase.replication(Default: false) - Controls whether replication is enabled or disabled for the cluster. replication.sleep.before.failover(Default: 2000) - The amount of time a failover worker waits before attempting to replicate a dead region servers HLog queues. replication.executor.workers(Default: 1) - The number of dead region servers one region server should attempt to failover simultaneously.2.1.5 Choosing region servers to replicate toWhen a master cluster RS initiates a replication source to a slave cluster, it first connects to the slaves ZooKeeper ensemble using the provided cluster key (that key is composed of the value of hbase.zookeeper.quorum, zookeeper.znode.parent and perty.clientPort). It then scans the rs directory to discover all the available sinks (region servers that are accepting incoming streams of edits to replicate) and will randomly choose a subset of them using a configured ratio (which has a default value of 10%). For example, if a slave cluster has 150 machines, 15 will be chosen as potential recipient for edits that this master cluster RS will be sending. Since this is done by all master cluster RSs, the probability that all slave RSs are used is very high, and this method works for clusters of any size. For example, a master cluster of 10 machines replicating to a slave cluster of 5 machines with a ratio of 10% means that the master cluster RSs will choose one machine each at random, thus the chance of overlapping and full usage of the slave cluster is higher.2.1.6 Keeping track of logsEvery master cluster RS has its own znode in the replication znodes hierarchy. It contains one znode per peer cluster (if 5 slave clusters, 5 znodes are created), and each of these contain a queue of HLogs to process. Each of these queues will track the HLogs created by that RS, but they can differ in size. For example, if one slave cluster becomes unavailable for some time then the HLogs should not be deleted, thus they need to stay in the queue (while the others are processed). See the section named Region server failover for an example.When a source is instantiated, it contains the current HLog that the region server is writing to. During log rolling, the new file is added to the queue of each slave clusters znode just before its made available. This ensures that all the sources are aware that a new log exists before HLog is able to append edits into it, but this operations is now more expensive. The queue items are discarded when the replication thread cannot read more entries from a file (because it reached the end of the last block) and that there are other files in the queue. This means that if a source is up-to-date and replicates from the log that the region server writes to, reading up to the end of the current file wont delete the item in the queue.When a log is archived (because its not used anymore or because theres too many of them per hbase.regionserver.maxlogs typically because insertion rate is faster than region flushing), it will notify the source threads that the path for that log changed. If the a particular source was already done with it, it will just ignore the message. If its in the queue, the path will be updated in memory. If the log is currently being replicated, the change will be done atomically so that the reader doesnt try to open the file when its already moved. Also, moving a file is a NameNode operation so, if the reader is currently reading the log, it wont generate any exception.2.1.7 Reading, filtering and sending editsBy default, a source will try to read from a log file and ship log entries as fast as possible to a sink. This is first limited by the filtering of log entries; only KeyValues that are scoped GLOBAL and that dont belong to catalog tables will be retained. A second limit is imposed on the total size of the list of edits to replicate per slave, which by default is 64MB. This means that a master cluster RS with 3 slaves will use at most 192MB to store data to replicate. This doesnt account the data filtered that wasnt garbage collected.Once the maximum size of edits was buffered or the reader hits the end of the log file, the source thread will stop reading and will choose at random a sink to replicate to (from the list that was generated by keeping only a subset of slave RSs). It will directly issue a RPC to the chosen machine and will wait for the method to return. If its successful, the source will determine if the current file is emptied or if it should continue to read from it. If the former, it will delete the znode in the queue. If the latter, it will register the new offset in the logs znode. If the RPC threw an exception, the source will retry 10 times until trying to find a different sink.2.1.8 Cleaning logsIf replication isnt enabled, the masters logs cleaning thread will delete old logs using a configured TTL. This doesnt work well with replication since archived logs passed their TTL may still be in a queue. Thus, the default behavior is augmented so that if a log is passed its TTL, the cleaning thread will lookup every queue until it finds the log (while caching the ones it finds). If its not found, the log will be deleted. The next time it has to look for a log, it will first use its cache.2.1.9 Region server failoverAs long as region servers dont fail, keeping track of the logs in ZK doesnt add any value. Unfortunately, they do fail, so since ZooKeeper is highly available we can count on it and its semantics to help us managing the transfer of the queues.All the master cluster RSs keep a watcher on every other one of them to be notified when one dies (just like the master does). When it happens, they all race to create a znode called lock inside the dead RS znode that contains its queues. The one that creates it successfully will proceed by transferring all the queues

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论