Oracle数据库一次ORA-00600[4194]事件处理

简介: 一、问题背景1、20191120某客户托尔斯程序运行异常,无法正常工作2、查看程序日志显示ORA-00600、ORA-02002等数据库错误3、数据库服务器信息    ①系统:CentOS 6.8(64位)    ②数据库:Oracle RAC 11.

一、问题背景

1、20191120某客户托尔斯程序运行异常,无法正常工作
2、查看程序日志显示ORA-00600、ORA-02002等数据库错误
3、数据库服务器信息
    ①系统:CentOS 6.8(64位)
    ②数据库:Oracle RAC 11.2.0.4

二、问题排查&处理

1、查看数据库信息

    ①首先查看了Oracle RAC集群信息,数据库集群运行正常
    ②查看数据库日志,节点1运行正常,节点2 alert日志中有大量ORA600信息:

Wed Nov 20 11:28:30 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_11752.trc  (incident=51957):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 946, block 733 to scn 2776710962
Recovery of Online Redo Log: Thread 2 Group 3 Seq 946 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_3.261.989703707
Block recovery completed at rba 946.735.16, scn 0.2776710963
Block recovery from logseq 946, block 733 to scn 2776711284
Recovery of Online Redo Log: Thread 2 Group 3 Seq 946 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_3.261.989703707
Block recovery completed at rba 946.849.16, scn 0.2776711285
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_11752.trc  (incident=51958):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-02002: error while writing to audit trail
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

    ③查看当前数据库集群中每个节点上会话信息,节点1上有程序连接进来的信息,节点2基本都是系统用户会话信息

2、根据客户反馈的信息,进行下一步处理

    ①客户反馈之前也遇到过目前的情况(ORA2002),之前是对system表空间进行手动resize,之后程序就会恢复正常,但是这次操作了之后也没有恢复正常,程序报错如下:
ter_error

    ②根据客户描述的信息检查数据库审计情况
        a、发现数据库审计参数AUDIT_TRAIL=DB,是数据库默认的配置
        b、接下来检查system表空间信息,当前system表空间是3G,使用率70%(客户自己resize后),并且system表空间是自动扩展,看来问题不是由于审计表空间不足引起的
    ③system充足的情况下,程序多次重启还是报ORA2002错误,看日志显示ORA600后紧接着就是ORA2002,为了及时恢复业务,客户也表示没有使用到数据库审计,决定关闭数据库审计功能

3、数据库修改了AUDIT_TRAIL=NONE,truncate了SYS.AUD$表数据,关闭了数据库集群,进行了重启,观察程序情况:

    ①程序在关闭数据库审计后启动,仍然报错,此时报错内容只剩下了ORA600
trs_error1
    ②查看当前数据库中session信息,和之前一样,节点1有程序连接信息,节点2上没有
    ③这时再查看节点2alert日志信息,ORA2002虽然没有了,但是ORA600仍然不断打印出来

Wed Nov 20 21:36:11 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_19012.trc  (incident=100413):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/racecs/racecs2/incident/incdir_100413/racecs2_ora_19012_i100413.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Nov 20 21:36:16 2019
Dumping diagnostic data in directory=[cdmp_20191120213616], requested by (instance=2, osid=19012), summary=[incident=100413].
Block recovery from logseq 972, block 168 to scn 2776911378
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.173.16, scn 0.2776911845
Block recovery from logseq 972, block 168 to scn 2776914575
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.657.16, scn 0.2776914576
Wed Nov 20 21:38:25 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_18696.trc  (incident=100333):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/racecs/racecs2/incident/incdir_100333/racecs2_ora_18696_i100333.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
...
Wed Nov 20 21:46:09 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_18732.trc  (incident=100373):
ORA-00600: Śҿխϳպë, ӎ˽: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 972, block 1011 to scn 2776915928
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1016.16, scn 0.2776915929
Block recovery from logseq 972, block 1011 to scn 2776916012
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1031.16, scn 0.2776916013
Wed Nov 20 21:48:41 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_21433.trc  (incident=100445):
ORA-00600: Śҿխϳպë, ӎ˽: [4194], [ody "SYS.DBMS_STANDARD"
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 972, block 1011 to scn 2776915928
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1016.16, scn 0.2776915929
Block recovery from logseq 972, block 1011 to scn 2776916447
Recovery of Online Redo Log: Thread 2 Group 7 Seq 972 Reading mem 0
  Mem# 0: +DATA/racecs/onlinelog/group_7.log
Block recovery completed at rba 972.1073.16, scn 0.2776916448

4、根据报错内容在MOS查看相关问题,发现文档1428786.1和这次的现象类似,并且节点2服务器在15号多次发生了异常重启,引起数据库多次异常重启

Fri Nov 15 08:32:28 2019
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 16777216 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
 select total_size,awr_flush_emergency_count from v$ash_info; 
Fri Nov 15 09:06:34 2019
IPC Send timeout detected. Receiver ospid 7546 [
Fri Nov 15 09:06:34 2019
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_ora_7546.trc:
Fri Nov 15 09:29:15 2019
minact-scn: useg scan erroring out with error e:12751
Fri Nov 15 09:53:19 2019
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_asmb_3794.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 217 Serial number: 61
Errors in file /u01/app/oracle/diag/rdbms/racecs/racecs2/trace/racecs2_asmb_3794.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 217 Serial number: 61
ASMB (ospid: 3794): terminating the instance due to error 15064
Termination issued to instance processes. Waiting for the processes to exit
Fri Nov 15 09:53:30 2019
Instance termination failed to kill one or more processes
Instance terminated by ASMB, pid = 3794
Fri Nov 15 10:16:47 2019
Starting ORACLE instance (normal)

5、根据MOS文档里面的描述,很可能就是由于实例2服务器异常重启导致数据库异常,MOS文档描述原因如下:

The following error is occurring in the alert.log right before the database crashes.
ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []
This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:

Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Since we are adding a new undo record to our undo block, we would expect that the new record number is equal to the maximum record number in the undo block plus one. Before Oracle can add a new undo record to the undo block it validates that this is correct. If this validation fails, then an ORA-600 [4194] will be triggered.

This issue generally occurs when there is a power outage or hardware failure that initially crashes the database. On startup, the database does the normal roll forward (redo) and then rollback (undo), this is where the error is generated on the rollback.

大致意思是:
在向undo块添加一个新的undo记录,会将新的记录号等于undo块中的最大记录号加1。在Oracle向undo块添加一个新的undo记录之前,它会验证这是正确的。如果验证失败,将触发ORA-600[4194]。
这个问题通常发生在出现断电或硬件故障时,这些故障最初会导致数据库崩溃。在启动时,数据库执行普通的前滚(重做),然后回滚(撤消),这是在回滚时产生错误的地方。
从这里在结合实例2在15号多次重启的现象,可以看出很可能就是因为实例2服务器异常重启导致

6、根据上面定位到的问题对数据库进行修复

*创建一个新的undo表空间,使用新的undo段号,它比当前使用的段号要高。这样,当一个事务执行清除块时,对撤消段的引用不存在,并继续清除块,这样就不会在产生上面发生的内容。
实例2进行操作
①、Create pfile='/tmp/initsid.ora' from spfile;
②、修改undo
a、查看当前undo表空间位置和大小
b、实例2创建新的undo,替换之前的undo
  create undo tablespace undotbs3 datafile '+DATA/racecs/datafile/undotbs3.dbf' size 1500m autoextend on;(注意根据之前undo值进行替换)
  
  alter system set undo_tablespace='undotbs3' scope=spfile sid='racecs2';
③、实例2数据库重启
  shutdown immediate
  startup

7、操作完成之后,程序连接不在报错

三、其他

1、在数据库恢复之后,隔了一天又反馈出现问题,发现实例2服务器再一次发生了重启,这次数据库日志中到没有出现ORA600信息,数据库运行正常
2、接下来应该重点排查实例2服务器异常重启原因

相关文章
|
2天前
|
DataWorks Oracle 关系型数据库
DataWorks操作报错合集之尝试从Oracle数据库同步数据到TDSQL的PG版本,并遇到了与RAW字段相关的语法错误,该怎么处理
DataWorks是阿里云提供的一站式大数据开发与治理平台,支持数据集成、数据开发、数据服务、数据质量管理、数据安全管理等全流程数据处理。在使用DataWorks过程中,可能会遇到各种操作报错。以下是一些常见的报错情况及其可能的原因和解决方法。
14 0
|
9天前
|
存储 Oracle 网络协议
Oracle 11gR2学习之二(创建数据库及OEM管理篇)
Oracle 11gR2学习之二(创建数据库及OEM管理篇)
|
13天前
|
Oracle 网络协议 关系型数据库
异地使用PLSQL远程连接访问Oracle数据库【内网穿透】
异地使用PLSQL远程连接访问Oracle数据库【内网穿透】
|
13天前
|
SQL Oracle 安全
Oracle数据库中的事务和锁
【4月更文挑战第19天】Oracle数据库的事务和锁是确保数据完整性和并发控制的核心机制。事务遵循ACID原则,保证操作的原子性、一致性、隔离性和持久性。通过COMMIT或ROLLBACK来管理事务更改。锁包括共享锁(读)、排他锁(写)、行级锁和表级锁,用于控制并发访问。自动锁机制在DML操作时生效,防止数据冲突。事务和锁共同维护数据库的稳定和安全。
|
13天前
|
存储 Oracle 关系型数据库
Oracle RAC:数据库集群的舞动乐章
【4月更文挑战第19天】Oracle RAC是Oracle提供的高可用性数据库解决方案,允许多个实例共享同一数据库,确保业务连续性和数据完整性。通过集群件和全局缓存服务实现服务器间的协调和通信。RAC提供高可用性,通过故障转移应对故障,同时提升性能,多个实例并行处理请求。作为数据管理员,理解RAC的架构和管理至关重要,以发挥其在数据管理中的最大价值。
|
13天前
|
监控 Oracle 安全
Oracle用户事件触发器:数据库世界的“福尔摩斯”
【4月更文挑战第19天】Oracle用户事件触发器是数据库中的监控机制,类似于“福尔摩斯”,在用户执行特定操作时自动触发。它们关注用户行为而非数据变化,可用于权限检查、安全监控、性能优化等。通过DDL语句创建,需注意逻辑清晰、条件合适及定期更新,以适应数据库变化和业务发展。掌握其使用能有效保障数据安全与稳定。
|
13天前
|
SQL 存储 Oracle
Oracle语句级触发器:数据库的“隐形哨兵”
【4月更文挑战第19天】Oracle语句级触发器是数据库中的自动执行程序,当特定事件(如INSERT、UPDATE、DELETE)发生时,会针对整个SQL语句触发。以新员工入职记录日志为例,创建语句级触发器可自动在操作后向日志表插入信息,减少手动工作并提高性能。虽然无法处理行级详细信息,但在处理大量数据时,相比行级触发器更高效。掌握触发器使用能提升数据管理效率和安全性。
|
23小时前
|
关系型数据库 MySQL Linux
【MySQL-10】数据库函数-案例演示【字符串/数值/日期/流程控制函数】(代码演示&可cv代码)
【MySQL-10】数据库函数-案例演示【字符串/数值/日期/流程控制函数】(代码演示&可cv代码)
【MySQL-10】数据库函数-案例演示【字符串/数值/日期/流程控制函数】(代码演示&可cv代码)
|
1天前
|
SQL 关系型数据库 MySQL
【MySQL-5】DDL的数据库操作:查询&创建&删除&使用(可cv代码+演示图)
【MySQL-5】DDL的数据库操作:查询&创建&删除&使用(可cv代码+演示图)
|
1天前
|
SQL 关系型数据库 MySQL
【MySQL-1】理解关系型数据库&数据的数据模型
【MySQL-1】理解关系型数据库&数据的数据模型

推荐镜像

更多