I was testing some wordpress optimization ideas which lead me into writing some profiling code which somehow lead me into installing alex rabe’s wordpress memory usage plugin. What especially caught my eye was the test that demonstrated that wordpress with german translation requires 6.5 Megs more then the english wordpress version. I have repeated his test with the hebrew translation and got a difference of about 4 Megs (the difference between the languages can be explained by the fact that german has notoriously long words or the hebrew translation is somewhat sloppy).
This just confirmed something that I have thought for quite a long time but hadn’t figured out how to quantify it. The problem with the wordpress translation process is that you translate the whole of wordpress and generate a single translation file which essential contains pairs of strings and their translation. Then you place your translation in predefined place and wordpress loads the whole translation pairs to the memory when the first string is being translated.
The problem with this approach is that as time passes by, the admin interface accumulates much more translatable strings then the front end functions. Of the ~3200 translatable string in wordpress I guess that no more the 400 are used when the actual HTML for the blog posts is generated. In other words, about 80% of the memory consumed by the translation will never be used for the work wordpress does for the 99% of the incoming trafic.
As if to demonstrate the problem, for version 2.8 the wordpress developers had split the translation file and created a unique translation file for the cities of the world, used when configuring time zone information, which is loaded only in one location in the admin interface and by that saved 400 string from loading at any other page of the site. But then, those strings where actually loaded when you reached that page, and with some help from inefficient PHP code, have caused an “out of memory” php error. The end result was that part of the page did not get displayed and since the error was not displayed, it was really hard to even understand what is the problem, and harder to find its cause.
The whole issue is not being helped by the fact that there is no way to use the PHP gettext module from wordpress without hacking the core. I don’t know how much it will be more efficient but since it is written in the C language it can theoretically use less memory and have a faster response time then the PHP implementation.
Ideal translation implementation will have a single translation file for a single uri, but this is currently far from being practical as it is hard to know all the possible execution paths just by looking and the code, and therefor hard to determine which translations will be required for each uri. Maybe some run-time adaptive algorithm which will compute the translation file while wordpress is executing will give the best approximation to the ideal.
Meanwhile, in the real world, the translation should be separated at least in to two parts – the front end, and the admin. This should be done either by moving all of the admin related string to reside under the wp-admin directory while the front end will be found only under wp-includes directory. Alternatively the wordpress release process should generate two POT files for the release by employing some internal knowledge on which files/strings should be used only at the admin, aknowledge which most of wordpress tranlators lacks.
While this will not necessarily solve the problem of the translation “nuking” the admin interface, it will at least reduce the memory consumption and improve CPU performance at the front end
Note: while caching plugins like super cache can be used to practically ignore this problem, not every shared hosting has the facilities to support it, and it has its own drawbacks which will prevent it from being a bullet proof solution (otherwise its place is in the wordpress core).
Update: Looks like I am not the only one who is bothered by the inefficiencies in the way the translation works. Johan Eenfeldt had created a plugin which provides caching to the parsed translation file. This might not solve my memory related problems (I will have to use the file cache option), but at least it is a performance gain.