Postgresql的XLOG累积源码分析-阿里云开发者社区

Postgresql的XLOG累积源码分析

2019-08-02 1441

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

简介： title: PGSQL的XLOG生成和清理逻辑 date: 2018-12-01 08:00:00 categories: Postgresql 总结归纳XLOG清理逻辑 WAL归档 # 在自动的WAL检查点之间的日志文件段的最大数量checkpoint_segments = # 在自动WAL检查点之间的最长时间checkpoint_timeout = # 缓解io压力ch

title: PGSQL的XLOG生成和清理逻辑
date: 2018-12-01 08:00:00
categories: Postgresql

总结归纳XLOG清理逻辑

WAL归档

# 在自动的WAL检查点之间的日志文件段的最大数量
checkpoint_segments = 
# 在自动WAL检查点之间的最长时间
checkpoint_timeout = 
# 缓解io压力
checkpoint_completion_target = 
# 日志文件段的保存最小数量，为了备库保留更多段
wal_keep_segments = 
# 已完成的WAL段通过archive_command发送到归档存储
archive_mode = 
# 强制timeout切换到新的wal段文件
archive_timeout = 


max_wal_size = 
min_wal_size =

不开启归档时

文件数量受下面几个参数控制，通常不超过

(2 + checkpoint_completion_target) * checkpoint_segments + 1

或

checkpoint_segments + wal_keep_segments + 1个文件。

如果一个旧段文件不再需要了会重命名然后继续覆盖使用，如果由于短期的日志输出高峰导致了超过

3 * checkpoint_segments + 1个文件，直接删除文件。

开启归档时

文件数量：删除归档成功的段文件

抽象来看一个运行的PG生成一个无限长的WAL日志序列。每段16M，这些段文件的名字是数值命名的，反映在WAL序列中的位置。在不用WAL归档的时候，系统通常只是创建几个段文件然后循环使用，方法是把不再使用的段文件重命名为更高的段编号。

当且仅当归档命令成功时，归档命令返回零。在得到一个零值结果之后，PostgreSQL将假设该WAL段文件已经成功归档，稍后将删除段文件。一个非零值告诉PostgreSQL该文件没有被归档，会周期性的重试直到成功。

PG源码分析

删除逻辑

触发删除动作

RemoveOldXlogFiles
> CreateCheckPoint
> CreateRestartPoint

wal_keep_segments判断（调用这个函数修改_logSegNo，然后再传入RemoveOldXlogFiles）

static void
KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
{
  XLogSegNo segno;
  XLogRecPtr  keep;

  XLByteToSeg(recptr, segno);
  keep = XLogGetReplicationSlotMinimumLSN();

  /* compute limit for wal_keep_segments first */
  if (wal_keep_segments > 0)
  {
    /* avoid underflow, don't go below 1 */
    if (segno <= wal_keep_segments)
      segno = 1;
    else
      segno = segno - wal_keep_segments;
  }

  /* then check whether slots limit removal further */
  if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
  {
    XLogSegNo slotSegNo;

    XLByteToSeg(keep, slotSegNo);

    if (slotSegNo <= 0)
      segno = 1;
    else if (slotSegNo < segno)
      segno = slotSegNo;
  }

  /* don't delete WAL segments newer than the calculated segment */
  if (segno < *logSegNo)
    *logSegNo = segno;
}

删除逻辑

static void
RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
{
    ...
    ...
  while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
  {
    /* Ignore files that are not XLOG segments */
    if (strlen(xlde->d_name) != 24 ||
      strspn(xlde->d_name, "0123456789ABCDEF") != 24)
      continue;

    /*
     * We ignore the timeline part of the XLOG segment identifiers in
     * deciding whether a segment is still needed.  This ensures that we
     * won't prematurely remove a segment from a parent timeline. We could
     * probably be a little more proactive about removing segments of
     * non-parent timelines, but that would be a whole lot more
     * complicated.
     *
     * We use the alphanumeric sorting property of the filenames to decide
     * which ones are earlier than the lastoff segment.
     */
    if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
    {
      if (XLogArchiveCheckDone(xlde->d_name))
                # 归档关闭返回真
                # 存在done文件返回真
                # 存在.ready返回假
                # recheck存在done文件返回真
                # 重建.ready文件返回假
      {
        /* Update the last removed location in shared memory first */
        UpdateLastRemovedPtr(xlde->d_name);
                
                # 回收 或者 直接删除，清理.done和.ready文件
        RemoveXlogFile(xlde->d_name, endptr);
      }
    }
  }
    ...
    ...
}

归档逻辑

static void
pgarch_ArchiverCopyLoop(void)
{
  char    xlog[MAX_XFN_CHARS + 1];
    
    # 拿到最老那个没有被归档的xlog文件名
  while (pgarch_readyXlog(xlog))
  {
    int     failures = 0;

    for (;;)
    {
      /*
       * Do not initiate any more archive commands after receiving
       * SIGTERM, nor after the postmaster has died unexpectedly. The
       * first condition is to try to keep from having init SIGKILL the
       * command, and the second is to avoid conflicts with another
       * archiver spawned by a newer postmaster.
       */
      if (got_SIGTERM || !PostmasterIsAlive())
        return;

      /*
       * Check for config update.  This is so that we'll adopt a new
       * setting for archive_command as soon as possible, even if there
       * is a backlog of files to be archived.
       */
      if (got_SIGHUP)
      {
        got_SIGHUP = false;
        ProcessConfigFile(PGC_SIGHUP);
      }

      # archive_command没设的话不再执行
            # 我们的command没有设置，走的是这个分支
      if (!XLogArchiveCommandSet())
      {
        /*
         * Change WARNING to DEBUG1, since we will left archive_command empty to 
         * let external tools to manage archive
         */
        ereport(DEBUG1,
            (errmsg("archive_mode enabled, yet archive_command is not set")));
        return;
      }
            # 执行归档命令！
      if (pgarch_archiveXlog(xlog))
      {
        # 成功了，把.ready改名为.done
        pgarch_archiveDone(xlog);

        /*
         * Tell the collector about the WAL file that we successfully
         * archived
         */
        pgstat_send_archiver(xlog, false);

        break;      /* out of inner retry loop */
      }
      else
      {
        /*
         * Tell the collector about the WAL file that we failed to
         * archive
         */
        pgstat_send_archiver(xlog, true);

        if (++failures >= NUM_ARCHIVE_RETRIES)
        {
          ereport(WARNING,
              (errmsg("archiving transaction log file \"%s\" failed too many times, will try again later",
                  xlog)));
          return;   /* give up archiving for now */
        }
        pg_usleep(1000000L);  /* wait a bit before retrying */
      }
    }
  }
}

ready生成逻辑

static void
XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
{
...
            if (finishing_seg)
      {
        issue_xlog_fsync(openLogFile, openLogSegNo);

        /* signal that we need to wakeup walsenders later */
        WalSndWakeupRequest();

        LogwrtResult.Flush = LogwrtResult.Write;    /* end of page */

                # 归档打开 && wal_level >= archive
        if (XLogArchivingActive())
                    # 生成ready文件
          XLogArchiveNotifySeg(openLogSegNo);

        XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
...
}

总结

ready文件只要满足archive_mode=on和wal_lever>=archive，就总会生成（XLogWrite函数调用生成）
- 因为archive_command设置空，所以ready文件的消费完全由外部程序控制
done文件的处理由PG完成，两个地方会触发done文件处理，检查点和重启点
- 处理多少done文件受wal_keep_segments和replication_slot控制（KeepLogSeg函数）

WAL段累积的原因

注意：checkpoint产生的日志回不立即生成ready文件，是在下一个xlog后一块生成的

1 ReplicationSlot

打开流复制槽

2 较大的wal_keep_segments

检查参数配置，注意打开这个参数会使xlog和ready有一定延迟

3 回收出现问题

如果不使用PG自动回收机制，数据库依赖外部程序修改.ready文件，需要检测回收进程

（archive_mode=on archive_command=''）

4 检查点间隔过长

检查参数配置

Postgresql的XLOG累积源码分析

WAL归档

不开启归档时

开启归档时

PG源码分析

删除逻辑

归档逻辑

ready生成逻辑

总结

WAL段累积的原因

1 ReplicationSlot

2 较大的wal_keep_segments

3 回收出现问题

4 检查点间隔过长

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像