HMC performance with InfluxDB and Grafana
I modified my nmon2influxdb tool to import HMC Performance and Capacity Monitoring(PCM) data in InfluxDB.
HMC PCM data
Honestly I knew it was available since a long time but I didn’t see a use case at first. But talking with users from nmon2influxdb I saw most of the them was not using the tool to analyze nmon files but to have a consolidated view of all their servers. Loading data from the HMC itself it’s easy to setup and PCM data are pretty interesting. And you have a lot less measurements compared to nmon making it easier to centralize performance metrics from hundreds of partitions and servers.
Here a gist showing an entry example for a partition.
I choose to use the Processed Metrics with the default sample of every 30 seconds and fetching the last two hours of statistics.
So import should be done every 120 minutes or a little bit less to be safe.
HMC import
I updated the nmon2influxdb.org site with the latest informations. This post is mostly to show use cases.
You should have a look to demo.nmon2influxdb.org to HMC partition and HMC system dashboards to have a look what the results can be. User and password are demo.
After downloading the latest binary from github, you should update the configuration file ~/.nmon2influxdb.cfg with the informations needed to connect to your HMC:
hmc_server = "myhmc"
hmc_user = "hscroot"
hmc_password = "abc123"
After that, you just need to run the command:
nmon2influxdb hmc import
Getting list of managed systems
MANAGED SYSTEM: p750A
partition powerVC: 2940 points
MANAGED SYSTEM: p720-NIM_RETIRED
Error getting PCM data
MANAGED SYSTEM: POWER8-S824A
partition WM-SLES1: 17885 points
partition LV-PCM-Manager: 8330 points
partition PowerVC-LE: 7105 points
partition LVL-cluster2: 7134 points
partition lvl-cluster1: 7134 points
partition WM-SLES2: 17958 points
MANAGED SYSTEM: p750B
partition adxlpar2: 2952 points
partition adxlpar1: 4182 points
You can also specify options manually:
nmon2influxdb hmc import --hmc myhmc --hmcuser hscroot --hmcpass abc123
Note: I don’t like to have clear passwords in my configuration file. I plan to fix it in issue #29 but I would like to have a better configuration management module. I know what I want, I just need time to code it. :)
You can use pre-built dashboards available with the new release. I am still experimenting with the display of HMC metrics and didn’t want to hardcode dashboard for now. To load them in your grafana instance:
nmon2influxdb dashboard hmc_partition.json
nmon2infuxdb dashboard hmc_server.json
partition measurements
PartitionProcessor
What is really interesting here are the processing units measurements: you can see capped, uncapped or donated processing units. Obviously, you can also see the maximum and entitled processing units.
Here an example displaying the capped and uncapped processing units used by a partition:
PartitionMemory
Memory is less interesting. You can see the amount of physical and logical memory allocated which will change if you are using Active Memory Sharing or Active Memory Expansion but you will have no statistics about this memory’s usage. You need operating system’s statistics here.
PartitionVSCSIAdapters
If you look quickly, you will not see the difference with a nmon report. But here the data doesn’t show the vscsi device on partition’s side but the vhost device on the vio servers.
Note: it’s only displaying vio server by ID and not the vio’s partition name. I am thinking about fetching it from the system but it’s low priority for now. Let me know if you are interested. :)
PartitionVirtualEthernetAdapters
This one is one of the most interesting. Again you can see what is the vio server used to bridge the network traffic:
You can also see the differences between virtual traffic(between partitions in the same system) and physical(what is sent outside the system through vios).
I am just showing here some of the data aggregation capabilities. You can also filter a entire system by vswitch id for example.
PartitionVirtualFiberChannelAdapters
It’s something really similar to vscsi. By default, you can see the same kind of output than nmon:
But you can also choose to display the vio server id and the physical wwpn for multiple partitions on the same chart:
System measurements
SystemProcessor
It’s almost the same than partition level but for the whole system:
SystemSharedProcessorPool
It’s also possible to see cpu usage at system level by shared processor pools.
SystemMemory
A memory allocation view at system level.
SystemFiberChannelAdapters
HMC doesn’t provides metrics for physical FC adapters at client partition level but does it for vios.
SystemGenericPhysicalAdapters
It’s the same for generic physical adapters.
SystemGenericVirtualAdapters
This measurement will give all vhost statistics.
SystemGenericAdapters
Here again, the name is not obvious but it’s where you will find all ethernet adapters physical and virtual.
SystemSharedAdapters
This view can be used to see the Shared Ethernet Adapters statistics.
SystemSharedStoragePool
The last but not the least :) HMC gives great metrics to see the Shared Storage Pool usage.
It’s also giving throughput metrics:
But an interesting thing given by using InfluxDB is we can display metrics from different systems belonging to the same Shared Storage Pool. So we can see in one chart the I/O activity of all vios belonging to the same Shared Storage Pool.
Tagging is living :)
Without tagging the measurements, charts would not so flexible. It’s what gave the powerful data analysis capabilities. On nmon files, tagging was pretty limited but PCM data comes with a lot of informations allowing a lot more tagging and making data analysis a lot stronger.
One of InfluxDB great advantage is his SQL-like query system. It allows to group measurements by tags and apply filters in a great way.
It’s better to show what are the available tags on a measurement:
SELECT * FROM PartitionVirtualEthernetAdapters LIMIT 1
name: PartitionVirtualEthernetAdapters
--------------------------------------
time SEA ViosID VlanID VswitchID name partition system value
1479669301000000000 ent4 1 1130 0 sentBytes adxlpar2 p750B 39
So here we see we have this tags: SEA, ViosID, VlanID, VswitchID, name, partition, system and value.
It’s possible to see all values for a specific tag:
SHOW TAG VALUES FROM "PartitionVirtualEthernetAdapters" WITH KEY = "partition"
name: PartitionVirtualEthernetAdapters
--------------------------------------
key value
partition test_n1
partition test_n2
partition test_n3
And filter based on this value with a WHERE clause:
SELECT * FROM PartitionVirtualEthernetAdapters WHERE "partition" = 'test_n1' LIMIT 1
name: PartitionVirtualEthernetAdapters
--------------------------------------
time SEA ViosID VlanID VswitchID name partition system value
1479669301000000000 ent4 1 1130 0 ReceivedBytes test_n1 testsys1 1502
GROUP BY is really powerful. It’s also possible to perform calculations on this metrics. Here I use the mean function:
SELECT MEAN(value) FROM PartitionVirtualEthernetAdapters GROUP BY "VlanID"LIMIT 1
name: PartitionVirtualEthernetAdapters
tags: VlanID=1
time mean
---- ----
0 26.14571920001631
name: PartitionVirtualEthernetAdapters
tags: VlanID=10
time mean
---- ----
0 0
name: PartitionVirtualEthernetAdapters
tags: VlanID=1130
time mean
---- ----
0 754.0389195656023
It’s pretty nice to be able to query performance metrics like that but where it’s becoming really great is when you combine it with the query editor provided by Grafana.
It’s making complex queries fun. It will display for you all the available tags. You can easily build your chart without knowing InfluxDB SQL-Like syntax.
Templating
Grafana add another great feature by allowing templating.
It will create a variable with values generated from an InfluxDB query:
SHOW TAG VALUES FROM "SystemProcessor" WITH KEY = "system"
name: SystemProcessor
---------------------
key value
system POWER8-S824A
system Server-9117-MMC-SN105C627
system p750-SSIS
system p750A
system p750B
system p755-HPC
It’s also possible to have nested templating with query like that:
SHOW TAG VALUES FROM "PartitionProcessor" WITH KEY = "partition" where system =~ /$ManagedSystem/
It’s really useful for HMC data. It allows to display only partitions belonging to a managed system:
wrapping up
HMC developers gave us a great way to measure system performance.
Their API is maybe a little bit complex ;) but it’s very powerful.
I had a lot of fun developing this feature and I hope you will find it useful. Feedbacks are welcome. :)