Using the letter “é” in my previous post caused considerable problems for my blogging setup. Firstly I noticed that while I’ve configured WordPress to use UTF-8, pages were still being sent as iso-8859-1. This is a known problem with the majority of apache configurations.
Unfortunately the two suggested solutions in the bug report will not work for me.
AddDefaultCharset utf-8 only works for text/html pages, not php and
php_value default_charset "utf-8" doesn’t get passed through the srcf-cgi-handler, a known bug with this otherwise rather useful setup.
Although I can’t see where this is documented, copying
/etc/php4/cgi/php.ini to your wordpress directory allows you to locally change the php4 settings for php files in that directory. Sadly this mechanism is not as flexible as
.htaccess so you also need to copy (or symlink) the file into the
wordpress/wp-admin directory to make the changes apply to the admin interface too. Once you have done this it is then possible to set:
default_charset = "utf-8" and all your php pages will now use utf-8 encoding.
At this point my feed reader no longer gave errors for my RSS feed but Feed Validator still complained that my feeds weren’t 100% proper. This was fixed with some very minor editing of the wp-rss2.php and wp-rss.php files, changing the line:
header('Content-type: text/xml', true);
header('Content-type: text/xml; charset=utf-8', true);
While hacking this file I noticed another problem with RSS feeds which is specific to the srcf-cgi-handler. Often when the feed hadn’t changed I was getting a “500: Internal Server Error” instead of the expected “304: Not Changed”. This turned out to be because PHP is running in “CGI mode” and sending raw HTTP headers is not permitted. Further hacking of
wp-rss2.php as described in the SRCF FAQ seems to have solved the problem. Fortunately this is only a problem with the version of PHP4 in Debian woody so will go away once sarge is released.