First, an important remark. This is the first post on Netgen Blog (powered by eZ Publish). Hope it will not be the last :)
The post is describing how we solved a performance issue on one eZ Publish based web site. The site was using more web servers with shared disk device based on SAN. With no obvious reason web servers had high load averages and what was even worse: increasing load on one web server would quickly increase load on other web servers. The main cause for this behavior was the slow and many IO operations on shared device. Further inspection revealed that a large part of these IO operations were related to eZmutex files.
So there were 2 difficulties to solve:
- For some unknown reason, a number of files in the eZMutex cache folder had been growing and causing slower access for apache to that folder
- The var folder was mounted on a network shared device with a slow flock() system call (locking is distributed by file system to all web servers and therefor slower then usual)
To solve the performance problem we developed an eZ Publish extension which overrides eZMutex to use memcache instead of file locking. The extension is published and shared with the community on projects.ez.no. More about memcache on memcached.org.
In this case memcache is not intended to be a real lock manager, but it can handle this scenario very well because locking doesn't need to be persistent. Anyway, eZMutex is generally used for 2 purposes:
- to lock file writings in the var folder (locking in seconds)
- to lock cronjob scripts to prevent overlaping (locking in hours)
So the extension was built to support both cases (overriding eZMutex and eZruncronjob classes) and to use the same logic of the original mutex as much as possible.
There was one difference though. Memcache has an expiry possibility which is an option when creating a file. We have set 60 seconds for default expiry which is a reasonible time for generating caches. Expiry in memcache means that the key/value pair will not be deleted but scheduled for removal when needed. The important thing is that the memcache_add() function (base for locking) will be able to use expiried keys.
The direct result of using this extension with memcached was a sudden big drop of IO disk usage. It is hard to estimate the magnitude of the drop but a rough estimate would be around 90% (based on a generated graph from system monitoring tool). As the disk device was shared by more web servers over an OCFS2 file system, this had a big impact on overall performance due to the fact that IO operations are more costly then on a single server system.
Using this extension on a normal eZ web (not too big; var folder on local disk, not shared; ezmutex folder not growing) makes no sense as the performance boost will be very small. But if you have a similar situation or problem as we did, this extension could have a big impact on the performance because there will be a lot less IO operations. Remember that for every file that eZ Publish writes to disk it also needs to create 2 ezmutex files and flock() them.