开发者社区> 问答> 正文

linux ha 里pgsql的promote超时

已解决

我这边是部署了suse linux ha 的一套主从PGSQL数据库,
然后有一天主库monitor 超时,导致发起了关闭主库 并且promote备库。
但是promote 备库超时了。而备库的pg_log的日志已经被后来恢复的时候删掉了。
请问有什么方法可以看到当时为什么promote超时 ,还有为什么主库会monitor超时。
这些东西要从哪里入手? 我在corosync的日志里看不出具体的原因,只看到触发了什么操作等.

ps:主库monitor超时的时候的pg_log ,显示做了backup.
01:28:06 [unknown] postgres NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be archived
01:28:23 [unknown] postgres NOTICE: pg_stop_backup complete, all required WAL segments have been archived
01:28:25 LOG: received fast shutdown request

展开
收起
燃烧宇宙中 2016-03-23 11:22:38 3822 0
1 条回答
写回答
取消 提交回答
  • 公益是一辈子的事, I am digoal, just do it. 阿里云数据库团队, 擅长PolarDB, PostgreSQL, DuckDB, ADB等, 长期致力于推动开源数据库技术、生态在中国的发展与开源产业人才培养. 曾荣获阿里巴巴麒麟布道师称号、2018届OSCAR开源尖峰人物.
    采纳回答

    promote并没有超时的说法,建议你再梳理一下corosync的流程。包括这个备份信息是不是corosync切换流程中的一环。
    另外再给你一个信息, promote分两种情况,一种需要做检查点,另一种不需要。

                            if (fast_promote)
                            {
                                    checkPointLoc = ControlFile->prevCheckPoint;
    
                                    /*
                                     * Confirm the last checkpoint is available for us to recover
                                     * from if we fail. Note that we don't check for the secondary
                                     * checkpoint since that isn't available in most base backups.
                                     */
                                    record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
                                    if (record != NULL)
                                    {
                                            fast_promoted = true;
    
                                            /*
                                             * Insert a special WAL record to mark the end of
                                             * recovery, since we aren't doing a checkpoint. That
                                             * means that the checkpointer process may likely be in
                                             * the middle of a time-smoothed restartpoint and could
                                             * continue to be for minutes after this. That sounds
                                             * strange, but the effect is roughly the same and it
                                             * would be stranger to try to come out of the
                                             * restartpoint and then checkpoint. We request a
                                             * checkpoint later anyway, just for safety.
                                             */
                                            CreateEndOfRecoveryRecord();
                                    }
                            }
    
                            if (!fast_promoted)
                                    RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
                                                                      CHECKPOINT_IMMEDIATE |
                                                                      CHECKPOINT_WAIT);
                    }

    如果是这样导致的corosync判断超时的话,建议你用fast promote.

    2019-07-17 18:35:22
    赞同 展开评论 打赏
问答标签:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
Alibaba Cloud Linux 3 发布 立即下载
ECS系统指南之Linux系统诊断 立即下载
ECS运维指南 之 Linux系统诊断 立即下载