HBase主节点换集群。master节点更换了新的集群，原来的数据没有删除，在启动集群后，master闪挂，请教各位可能的原因是什么？

根据你提供的日志判断master闪挂的原因是有挂掉的DataNode节点，通过节点监控WebUI或者访问各个节点查看是否有DataNode已经挂掉，并且查看hdfs-site.xml文件replication的配置，由于节点坏死导致无法获取Block数据，重启死掉的DataNode节点即可，还有一种原因是DataNode之间的通信有问题，可能性很小，参考以下文章：https://thebipalace.com/2016/05/16/hadoop-error-org-apache-hadoop-hdfs-blockmissingexception-could-not-obtain-block/
HMaster报错是在启动Master的是开启activeMasterManager的时候调用finishActiveMasterInitialization()结束activeMaster初始化的时候this.fileSystemManager = new MasterFileSystem(this, this);由于DataNode节点死掉的缘故，创建MasterFileSystem会抛异常，导致“Failed to become active master”问题，源码如下：
private void startActiveMasterManager(int infoPort) throws KeeperException {

String backupZNode = ZKUtil.joinZNode(
zooKeeper.backupMasterAddressesZNode, serverName.toString());
/*

Add a ZNode for ourselves in the backup master directory since we
may not become the active master. If so, we want the actual active
master to know we are backup masters, so that it won't assign
regions to us if so configured.
*
If we become the active master later, ActiveMasterManager will delete
this node explicitly. If we crash before then, ZooKeeper will delete
this node for us since it is ephemeral.
*/

LOG.info("Adding backup master ZNode " + backupZNode);
if (!MasterAddressTracker.setMasterAddress(zooKeeper, backupZNode,

serverName, infoPort)) {

LOG.warn("Failed create of " + backupZNode + " by " + serverName);
}

activeMasterManager.setInfoPort(infoPort);
// Start a thread to try to become the active master, so we won't block here
Threads.setDaemonThreadRunning(new Thread(new Runnable() {
@Override
public void run() {

int timeout = conf.getInt(HConstants.ZK_SESSION_TIMEOUT,
  HConstants.DEFAULT_ZK_SESSION_TIMEOUT);
// If we're a backup master, stall until a primary to writes his address
if (conf.getBoolean(HConstants.MASTER_TYPE_BACKUP,
  HConstants.DEFAULT_MASTER_TYPE_BACKUP)) {
  LOG.debug("HMaster started in backup mode. "
    + "Stalling until master znode is written.");
  // This will only be a minute or so while the cluster starts up,
  // so don't worry about setting watches on the parent znode
  while (!activeMasterManager.hasActiveMaster()) {
    LOG.debug("Waiting for master address ZNode to be written "
      + "(Also watching cluster state node)");
    Threads.sleep(timeout);
  }
}
MonitoredTask status = TaskMonitor.get().createStatus("Master startup");
status.setDescription("Master startup");
try {
  if (activeMasterManager.blockUntilBecomingActiveMaster(timeout, status)) {
    finishActiveMasterInitialization(status);
  }
} catch (Throwable t) {
  status.setStatus("Failed to become active: " + t.getMessage());
  LOG.fatal("Failed to become active master", t);
  // HBASE-5680: Likely hadoop23 vs hadoop 20.x/1.x incompatibility
  if (t instanceof NoClassDefFoundError &&
    t.getMessage()
      .contains("org/apache/hadoop/hdfs/protocol/HdfsConstants$SafeModeAction")) {
    // improved error message for this special case
    abort("HBase is having a problem with its Hadoop jars.  You may need to "
      + "recompile HBase against Hadoop version "
      + org.apache.hadoop.util.VersionInfo.getVersion()
      + " or change your hadoop jars to start properly", t);
  } else {
    abort("Unhandled exception. Starting shutdown.", t);
  }
} finally {
  status.cleanup();
}

}
}, getServerName().toShortString() + ".activeMasterManager"));

}

HBase主节点换集群。master节点更换了新的集群，原来的数据没有删除，在启动集群后，master闪挂，请教各位可能的原因是什么？

相关课程

相关电子书

HBase主节点换集群。master节点更换了新的集群，原来的数据没有删除，在启动集群后，master闪挂，请教各位可能的原因是什么？

相关课程

相关文章

相关电子书