Life, Football, Technology and Vespas…

Posts tagged “RHEL

Migration – Tidying up the pieces

It has been 2 months since the start of our migration from qmail-ldap to ZCS. There have been a number of issues that we have had to deal over this time, and now that things have had a chance to settle, it is a good time to jot it all down.

I mentioned previously that we had the issue of foul data in our billing/provisioning platform. A lot of man hours went in to tidying up this data. Our sync-replication script from our qmail-ldap server (to which our billing platform still provisions to) is still looking for the changes being made to the old ldap server and then making the necessary changes to zmprov.

We ran into an issue with connections to the mysqlDB on our mailstores. We monitor the number of connections to the database, but it is hard to determine how many connections from the connection pool are actually being used. The mailboxd service opens up all of the mysql connectors that it is configured to use and then it is hard to determine any trends or patterns of number of connections. We would run out of connections to the database, which was causing all manners of problems – notably POP3 not being available.

We picked up on some issues with the zimbra cluster checking script. Zimbra implements some cronjobs and other measures rotate logs and such like. Problem is, if you rotate any MTA related log files, as part of the post rotate script, it wants to restart the MTA. Obviously little thought was given to the zmcluctl script which is used by the cluster to determine the status of the cluster. Upon rotation of the logs at 4am every morning, our cluster would think that it was down due to the  MTA not being up and then restart the cluster service. There is a bug logged for this and there is also a work around that we implemented.

Disk I/O. 

Disk IO - 1

Disk IO - 2

The graphs show IO operations for one of our 4 mailstores. DM-13 is the lun for our MySQL data. DM-8 is our primary store:

 

# dmsetup ls

sysvg-tmp       (253, 1)

zimbra1_redo-redo       (253, 11)

zimbra1_sqldb-data      (253, 13)

zimbra1_secondary-first (253, 7)

sysvg-usr       (253, 4)

sysvg-var       (253, 3)

zimbra1_log-log (253, 10)

zimbra1_index-index     (253, 12)

sysvg-home      (253, 2)

zimbra1_base-base       (253, 9)

zimbra1_primary-store   (253, 8)

sysvg-opt       (253, 6)

sysvg-swap      (253, 5)

sysvg-root      (253, 0)

It is clear to see that they system is working hard! Our initial thoughts about the requirements needed did not take into account that POP3 is actually quite IO intensive. It makes sense now obviously, but we had not considered it. The long term plan to encourage default usage to be web based, will help ease the IO on the DB and primary storage LUNS.

Disk Utilization:

Disk Utilization

This is quite a cool graph. It shows the HSM at work. Every Friday, the HSM kicks in and moves mail that is older than a week to the secondary storage, from the primary storage. Remember this is a graph from only one of our mailstores. Everyone has only a 25Mb mailbox at the moment, but that will change once we have made a decision as to what we can scale to easily.

There have been some other quirky minor tweaks, but overall, it is running well and there are very few complaints from our users. Most do not even know that they have been migrated. After we have implemented our customized skin and large default mailbox, then I reckon there will be a drive to start educating customers and moving them to a web based experience.

The next process is to start analysing all the statistics that are available. We have a large number of used licenses, but we do not believe that we have as many as half or our users active. So that operation has to start soon. We will probably munge all the audit logs and put the information into a centralised database for interrogation.

What has to be said in closing is that we are very happy with our RHEL 5.3 + Xen + GFS + RHEL Cluster. It has been very stable for us so far (touch wood) and it was always a bit of an unknown for us during this project. The idea of the clustering of the virtual machines in theory was sound, but we always had a niggle in the back of our minds about the feasibility – especially as we were struggling with some aspects a couple of months prior to going live!

With all that said, it is really come together nicely.


RHEL5.3

We have been running RHEL5.3 a little over a few weeks on our grid environment. The upgrade is not smooth if you are wanting to do a rolling upgrade of a cluster environment. You can’t. The cluster node you upgrade will not rejoin the cluster due to the newer version of openais – AFTER a reboot. The work around is downgrade openais before rebooting into the new kernel.

Ok. So what are our thoughts? I think for most people, there will be little noticeable difference – which is good in my opinion. Evolution, not revolution for an enterprise OS. There are however some nice additional features in 5.3 which we are benefiting from as well as the bug fixes and resolved issues.

GFS2 – probably the most significant feature of RHEL5.3 for our environment presently. If you are still running GFS, the gfs2_convert utility will convert to GFS2 for you. Nice.

The cluster suite enhancements will also be beneficial to us. It is hard to say at the moment if we have really noticed any difference. The stability of our cluster has been good since the upgrade. Long may that continue. The same can be said about the virtualization enhancements and bug fixes.

dstat is included in RHEL5.3 also – which is a welcomed addition

Some other good stuff worth looking at which is now part of RHEL5 :

  • Block device encryption & eCryptfs for the paranoid.
  • FCoE
  • Explicit active-passive failover (ALUA) mode using dm-multipath on EMC Clariion storage is now available. 
  • ext4
  • Stateless linux
  • clvm mirroring

As we progress I will no doubt update my thoughts and experiences on RHEL5.3 and it’s bundled software. But for now, like I said, on the surface – no major difference in the environment in which we run it. I suspect however, looking at the desktop and drivers / utilities, the desktop experience should be enhanced dramatically. AIGLX is included as is better support for wireless devices.

Full Release Notes