11gR2 新特性:Oracle Cluster Health Monitor(CHM)简介
Cluster Health Monitor (CHM) FAQ (文档 ID 1328466.1)
In this Document
Purpose |
Questions and Answers |
What is the Cluster Health Monitor? |
What is the purpose of the Cluster Health Monitor? |
What platform does Cluster Health Monitor support and where can I get the Cluster Health Monitor? |
What is the resource name for Cluster Health Monitor in 11.2.0.2 or higher? |
Is stop/start ora.crf affecting clusterware function or cluster database function? |
Can the Cluster Health Monitor be installed on a single node, non-RAC server? |
Do Engineered Systems like Exadata have a default usage with CHM and if so, any specific version?? |
Where is oclumon? |
How do I collect the Cluster Health Monitor data? |
Why does “diagcollection.pl --collect --chmos” return “Cannot parse master from output: ERROR : in reading init file” error? |
How do you get the syntax of different options and explanations for those options for diagcollection.pl and oclumon? |
What is IPD/OS? |
How is the Cluster Health Monitor different from OSWatcher? |
Is the Cluster Health Monitor replacing OSWatcher? |
How much of overhead does the Cluster Health Monitor cause? |
Does CHM on Multiple Node configurations (e.g. 4 to 8 nodes) have scaling concerns? |
Will CDB and PDB result in any new information or special conditions using CHM? |
How much of disk space is needed for the Cluster Health Monitor? |
How do I find out the size of data collected and saved by the Cluster Health Monitor in my system? |
How can I increase the size of the Cluster Health Monitor repository ? |
What platforms can I run the Cluster Health Monitor? |
What steps are needed to install 11.2.0.2 when the Cluster Health Monitor from OTN is already running? |
Where does the Cluster Health Monitor from OTN installed in Linux? |
What logs and data should I gather before logging a SR for the Cluster Health Monitor error? |
How do I increase the trace level the Cluster Health Monitor? |
Can I use procwatcher to get the pstack of the Cluster Health Monitor regularly? |
What are the processes and components for the Cluster Health Monitor? |
What is oclumon? |
What is definition of some of the files like *.bdb, _db.* , *.ldb , log.* files created by tool in the BDB (Berkeley Database) location directory ? |
Where is the location for the log files for the Cluster Health Monitor from OTN (pre 11.2.0.2)? |
How do I fix the problem that the time in the oclumon report is in UTC time zone instead of the time zone of my server? |
Can I install CHM from OTN on 11.2.0.2? What if I stop and disable CHM resource (ora.crf) on 11.2.0.2? |
Where is the trace file for client like oclumon? How do I increase the trace level for oclumon? |
Can the Directory path to the CHM Repository be same on all nodes if shared storage is used? |
How much of data (how long in time) does the node store CHM data locally when it cannot communicate with the master? |
How often does CHM collect the system metric data? Can this be changed? |
What is the CHM retention time? |
How can you reduce the size of bdb file that became big for any reason? |
Can you set up CHM to run locally on each node? |
Can CHM be used on a single node non-RAC server? |
How to start and stop CHM that is installed as a part of GI in 11.2 and higher? |
Database - RAC/Scalability Community |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 12.1.0.2 [Release 10.1 to 12.1]Information in this document applies to any platform.
PURPOSE
The Cluster Health Monitor FAQ is an evolving document that answers common questions about the Cluster Health Monitor
QUESTIONS AND ANSWERS
What is the Cluster Health Monitor?
What is the purpose of the Cluster Health Monitor?
By monitoring the data constantly, users can use the Cluster Health Monitor detect potential problem areas such as CPU load, memory constraints, and spinning processes before the problem causes an unwanted outage.
What platform does Cluster Health Monitor support and where can I get the Cluster Health Monitor?
The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium and IBM Linux Z) and Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3. The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.
Prior to 11.2.0.2 on Linux (not on Linux Itanium and IBM Linux Z), the Cluster Health Monitor can be downloaded from OTN.
http://www-content.oracle.com/technetwork/products/clustering/downloads/ipd-download-homepage-087212.html
The OTN version for Windows is not available. Please upgrade to 11.2.0.3 if you need CHM for Windows.
What is the resource name for Cluster Health Monitor in 11.2.0.2 or higher?
Is stop/start ora.crf affecting clusterware function or cluster database function?
Can the Cluster Health Monitor be installed on a single node, non-RAC server?
Do Engineered Systems like Exadata have a default usage with CHM and if so, any specific version??
Where is oclumon?
If the CHM is manually installed using the CHM file from OTN, then the location of oclumon is in:
Linux : /usr/lib/oracrf/bin
Windows : C:\Program Files\oracrf\bin
How do I collect the Cluster Health Monitor data?
For example, issue “/bin/diagcollection.pl --collect --crshome $ORA_CRS_HOME --chmos --incidenttime --incidentduration 05:00”
The above outputs the report that covers 5 hours from the time specified by incidenttime.
The incidenttime must be in MM/DD/YYYYHH:MN:SS where MM is month, DD is date, YYYY is year, HH is hour in 24 hour format, MN is minute, and SS is second. For example, if you want to put the incident time to start from 10:15 PM on June 01, 2011, the incident time is 06/01/201122:15:00. The incidenttime and incidentduration can be changed to capture more data.
Alternatively, ‘oclumon dumpnodeview -allnodes -v -last "11:59:59" > your-filename’ if diagcollection.pl fails with any reason. This will generate a report from the repository up to last 12 hours. The -last value can be changed to get more or less data.
Another example of using oclumon is 'oclumon dumpnodeview -allnodes -v -s "2012-06-01 22:15:00" -e "2012-06-02 03:15:00" > /tmp/chm.log '. The difference in this command is that it specifies the start (-s flag) and end time (-e flag).
In this case, the time format used is "YYYY-MM-DD HH24:MI:SS" like "2007-11-12 23:05:00".
Why does “diagcollection.pl --collect --chmos” return “Cannot parse master from output: ERROR : in reading init file” error?
The workaround for this is to issue
oclumon dumpnodeview -allnodes -v -last “amount of data needed”
For example, oclumon dumpnodeview -allnodes -v -last “01:00:00”
will provide last one hour of data from all nodes.
How do you get the syntax of different options and explanations for those options for diagcollection.pl and oclumon?
What is IPD/OS?
How is the Cluster Health Monitor different from OSWatcher?
Is the Cluster Health Monitor replacing OSWatcher?
On the other hand, if only one of the tools can be used, then Oracle recommends that the Cluster Health Monitor is used.
How much of overhead does the Cluster Health Monitor cause?
Does CHM on Multiple Node configurations (e.g. 4 to 8 nodes) have scaling concerns?
Will CDB and PDB result in any new information or special conditions using CHM?
How much of disk space is needed for the Cluster Health Monitor?
How do I find out the size of data collected and saved by the Cluster Health Monitor in my system?
To estimate the space required, use the following formula:
# of nodes * 720MB * 3 = Size required for 3 days retention
eg. for 4 node cluster: 4 * 720 * 3 = 8,640MB (8.4GB)
How can I increase the size of the Cluster Health Monitor repository ?
What platforms can I run the Cluster Health Monitor?
11.2.0.2: Solaris (Sparc 64 and x86-64 only), and Linux.
11.2.0.3: AIX, Solaris (Sparc 64 and x86-64 only), Linux, and Windows.
Cluster Health Monitor is NOT available for any Itanium platform such as Linux Itanium and Windows Itanium.
What steps are needed to install 11.2.0.2 when the Cluster Health Monitor from OTN is already running?
Where does the Cluster Health Monitor from OTN installed in Linux?
What logs and data should I gather before logging a SR for the Cluster Health Monitor error?
2) output of strace -v for osysmond.bin about 2 minutes.
3) strace -cp for about 2 min
4) oclumon dumpnodeview -v output for that node for 2 min.
5) output of "uname -a"
6) outpuft of "ps -eLf | grep osysmond.bin"
7) The ologgerd and sysmond log files in the CRS_HOME/log/ directory from all nodes
How do I increase the trace level the Cluster Health Monitor?
oclumon debug log all allcomp:
Higher the trace level, more detailed tracing is done, so do not forget to reset the trace level back to 1 (the trace level when the CHM is first installed) by issuing "oclumon debug log all allcomp:1"
Can I use procwatcher to get the pstack of the Cluster Health Monitor regularly?
What are the processes and components for the Cluster Health Monitor?
System Monitor Service (Sysmond) – the sysmond process collects the system statistics of the local node and sends the data to the master ologgerd. A sysmond process runs on every node and collects the system statistics including CPU, memory usage, platform info, disk info, nic info, process info, and filesystem info.
To find the master olggerd, one can use the following command:
oclumon manage -get master
What is oclumon?
You can also use oclumon to query and print the durations and the states for a resource on a node during a specified time period. These states are based on predefined thresholds for each resource metric and are denoted as red, orange, yellow, and green, indicating decreasing order of criticality.
What is definition of some of the files like *.bdb, _db.* , *.ldb , log.* files created by tool in the BDB (Berkeley Database) location directory ?
log.* - These are berkeley bdb logfiles which preserve changes before making them to the db files. We have checkpointing setup and it reuses the log files.
*.ldb - This is the local logging file and MUST be present on all servers.
Do not delete above files except in case of trying to reduce the size of bdb file that get grow to a large size. To reduce the size of bdb file, refer to the question "How can you reduce the size of bdb file that became big for any reason?" in this document.
Because it takes many days / weeks to resolve a problem like the node reboot or performance degradation, is there any way to keep the Cluster Health Monitor data for that long so that it can be replayed any time later when needed ?
Before 12.1.0.2, another way is to archive the whole BDB regularly (like every day) by making a copy of BDB file in the BDB location directory.
The way that CHMOS reads archived BDB is to start it in debug mode. It starts by using
ologdbg -d
After it starts, issue the oclumon dumpnodeview to get the data from the archived BDB.
For example, issue
oclumon dumpnodeview -n -s -e -v
Where is the location for the log files for the Cluster Health Monitor from OTN (pre 11.2.0.2)?
How do I fix the problem that the time in the oclumon report is in UTC time zone instead of the time zone of my server?
Can I install CHM from OTN on 11.2.0.2? What if I stop and disable CHM resource (ora.crf) on 11.2.0.2?
Where is the trace file for client like oclumon? How do I increase the trace level for oclumon?
Generally its not generated because, at the log level 0, there is no log data.
To see logs at higher log level one needs to do the following
1. oclumon [Enter the interactive mode]
2. query> debug log all allcomp:3
After this, any command execution will produce finer logs in oclumon.log
Can the Directory path to the CHM Repository be same on all nodes if shared storage is used?
How much of data (how long in time) does the node store CHM data locally when it cannot communicate with the master?
With a sampling interval of 1 second, ideally it will be around 1 hour of data. With 11.2.0.3, we have moved to sampling interval of 5 seconds, hence, in that case the data that can be retained is 4-5 hours of data.
How often does CHM collect the system metric data? Can this be changed?
Currently, the collection interval can not be changed.
What is the CHM retention time?
In 11.2.0.2, the retention time is determined by the size. The size has changed to 1GB. Depending on how large the cluster is, the retention time is different. For example, it is usually 6.9 hours for a one-node cluster when sampling interval is 1 second. Please issue "oclumon manage -get repsize" to find out the retention time of your cluster. The output is in seconds.
With sampling interval moving to 5 seconds in 11.2.0.3, the retention time becomes 5 times retention time with sampling interval 1 second.
It is recommended to set 72hours retention time.
How can you reduce the size of bdb file that became big for any reason?
oclumon manage -repos changesize .
As a temporary work around, you can kill ologgerd and delete the contents in the BDB directory. osysmond should respawn ologgerd and new bdb file will get created. The past data is lost when this is done.
Please note the minimum size must be >= 1024 MB (1 GB), otherwise CRS-9100 "Error setting Cluster Health Monitor repository size" will be reported.
Can you set up CHM to run locally on each node?
The Cluster Health Monitor that comes with the Grid Infrastructure install image must run with only one master ologgerd, so it can not be set up to run locally on each node.
Can CHM be used on a single node non-RAC server?
How to start and stop CHM that is installed as a part of GI in 11.2 and higher?
To stop CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl stop res ora.crf -init
To start CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl start res ora.crf -init
Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community
How to relocate CHM repository and increase retention time (文档 ID 2062234.1)
In this Document
Goal |
Solution |
11.2 |
12.1 |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and laterInformation in this document applies to any platform.
GOAL
Often CHM data ages out when if not collected on time, this note provides steps to increase the retention time which is strongly recommended.
SOLUTION
11.2
In 11.2, the repository of CHM is in Grid home, to change the retention time:
$ /bin/oclumon manage -repos resize 259200
racnode1 --> retention check successful
racnode2 --> retention check successful
New retention is 259200 and will use 4525424640 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.
Done
Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days.
In case there's insufficient amount of space in Grid home, relocate CHM data with the following command:
$ /bin/oclumon manage -repos reploc /home/grid/chm
racnode1 --> Ready to commit new location
racnode2 --> Ready to commit new location
New retention is 259200 and will use 4525424640 bytes of disk space
CRS-9113-Cluster Health Monitor repository location change completed on all nodes. Restarting Loggerd.
Done
12.1
In 12c, the repository of CHM is GIMR which is a database, only retention time can be changed. To change the retention time:
1. Check how much space is needed for the expected retention time:
The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 3896 MB
Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days. The output tells that the repository needs to be at least 3896 MB for 3 days.
2. Change the repository size:
The Cluster Health Monitor repository was successfully resized.The new retention is 259200 seconds.
REFERENCES
NOTE:1589394.1 - How to Move/Recreate GI Management Repository to Different Shared Storage (Diskgroup, CFS or NFS etc)
About Me
...............................................................................................................................
● 本文整理自网络
● 本文在itpub(http://blog.itpub.net/26736162)、博客园(http://www.cnblogs.com/lhrbest)和个人微信公众号(xiaomaimiaolhr)上有同步更新
● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/
● 本文博客园地址:http://www.cnblogs.com/lhrbest
● 本文pdf版及小麦苗云盘地址:http://blog.itpub.net/26736162/viewspace-1624453/
● 数据库笔试面试题库及解答:http://blog.itpub.net/26736162/viewspace-2134706/
● QQ群:230161599 微信群:私聊
● 联系我请加QQ好友(646634621),注明添加缘由
● 于 2017-06-02 09:00 ~ 2017-06-30 22:00 在魔都完成
● 文章内容来源于小麦苗的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解
● 版权所有,欢迎分享本文,转载请保留出处
...............................................................................................................................
拿起手机使用微信客户端扫描下边的左边图片来关注小麦苗的微信公众号:xiaomaimiaolhr,扫描右边的二维码加入小麦苗的QQ群,学习最实用的数据库技术。