开发者社区> 问答> 正文

HBase主节点换集群。master节点更换了新的集群,原来的数据没有删除,在启动集群后,master闪挂,请教各位可能的原因是什么?

会不会是因为原本的数据不完整,导致节点的数据块出错。中间有一次查看日志发现HDFS进入了安全模式,然后我给强制退出了。请教一下master闪挂的原因,以及如何解决?

展开
收起
hbase小能手 2018-11-08 11:29:02 3860 0
1 条回答
写回答
取消 提交回答
  • HBase是一个分布式的、面向列的开源数据库,一个结构化数据的分布式存储系统。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。阿里云HBase技术团队共同探讨HBase及其生态的问题。

    根据你提供的日志判断master闪挂的原因是有挂掉的DataNode节点,通过节点监控WebUI或者访问各个节点查看是否有DataNode已经挂掉,并且查看hdfs-site.xml文件replication的配置,由于节点坏死导致无法获取Block数据,重启死掉的DataNode节点即可,还有一种原因是DataNode之间的通信有问题,可能性很小,参考以下文章:https://thebipalace.com/2016/05/16/hadoop-error-org-apache-hadoop-hdfs-blockmissingexception-could-not-obtain-block/
    HMaster报错是在启动Master的是开启activeMasterManager的时候调用finishActiveMasterInitialization()结束activeMaster初始化的时候this.fileSystemManager = new MasterFileSystem(this, this);由于DataNode节点死掉的缘故,创建MasterFileSystem会抛异常,导致“Failed to become active master”问题,源码如下:
    private void startActiveMasterManager(int infoPort) throws KeeperException {

    String backupZNode = ZKUtil.joinZNode(
    zooKeeper.backupMasterAddressesZNode, serverName.toString());
    /*

    • Add a ZNode for ourselves in the backup master directory since we
    • may not become the active master. If so, we want the actual active
    • master to know we are backup masters, so that it won't assign
    • regions to us if so configured.
      *
    • If we become the active master later, ActiveMasterManager will delete
    • this node explicitly. If we crash before then, ZooKeeper will delete
    • this node for us since it is ephemeral.
      */

    LOG.info("Adding backup master ZNode " + backupZNode);
    if (!MasterAddressTracker.setMasterAddress(zooKeeper, backupZNode,

    serverName, infoPort)) {

    LOG.warn("Failed create of " + backupZNode + " by " + serverName);
    }

    activeMasterManager.setInfoPort(infoPort);
    // Start a thread to try to become the active master, so we won't block here
    Threads.setDaemonThreadRunning(new Thread(new Runnable() {
    @Override
    public void run() {

    int timeout = conf.getInt(HConstants.ZK_SESSION_TIMEOUT,
      HConstants.DEFAULT_ZK_SESSION_TIMEOUT);
    // If we're a backup master, stall until a primary to writes his address
    if (conf.getBoolean(HConstants.MASTER_TYPE_BACKUP,
      HConstants.DEFAULT_MASTER_TYPE_BACKUP)) {
      LOG.debug("HMaster started in backup mode. "
        + "Stalling until master znode is written.");
      // This will only be a minute or so while the cluster starts up,
      // so don't worry about setting watches on the parent znode
      while (!activeMasterManager.hasActiveMaster()) {
        LOG.debug("Waiting for master address ZNode to be written "
          + "(Also watching cluster state node)");
        Threads.sleep(timeout);
      }
    }
    MonitoredTask status = TaskMonitor.get().createStatus("Master startup");
    status.setDescription("Master startup");
    try {
      if (activeMasterManager.blockUntilBecomingActiveMaster(timeout, status)) {
        finishActiveMasterInitialization(status);
      }
    } catch (Throwable t) {
      status.setStatus("Failed to become active: " + t.getMessage());
      LOG.fatal("Failed to become active master", t);
      // HBASE-5680: Likely hadoop23 vs hadoop 20.x/1.x incompatibility
      if (t instanceof NoClassDefFoundError &&
        t.getMessage()
          .contains("org/apache/hadoop/hdfs/protocol/HdfsConstants$SafeModeAction")) {
        // improved error message for this special case
        abort("HBase is having a problem with its Hadoop jars.  You may need to "
          + "recompile HBase against Hadoop version "
          + org.apache.hadoop.util.VersionInfo.getVersion()
          + " or change your hadoop jars to start properly", t);
      } else {
        abort("Unhandled exception. Starting shutdown.", t);
      }
    } finally {
      status.cleanup();
    }

    }
    }, getServerName().toShortString() + ".activeMasterManager"));

    }

    2019-07-17 23:12:56
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
大数据时代的存储 ——HBase的实践与探索 立即下载
Hbase在滴滴出行的应用场景和最佳实践 立即下载
阿里云HBase主备双活 立即下载