HowTo: Create a MediaWiki Extension for XML/XSL Data
By Eric Hartwell - last updated March 26, 2006
As part of the research for my Apollo 17 project, I needed to prepare a definitive timeline/transcript of the mission. For my first pass, I built a simple HTML table with time, speaker, and text from a single transcript. Then, as I started combining multiple sources, I switched to HTML defined term / definition syntax, with inline comments and later separate footnotes. As I added more and more sources, the HTML got larger, more complex, and harder to manage.
A proper reference needs sources identified, interpretations explained, and a revision history with audit trail. I stated to write a .NET web application with its own database, when I realized I already have a web application that does all that - MediaWiki.
I've always believed it's as important to keep a record of what doesn't
work as what does. See MediaWiki Programming
Side Trips: Potholes, Detours, and Dead Ends which documents some of the
MediaWiki, the software that runs the various Wikimedia projects, allows developers to write their own extensions to the wiki markup. An extension defines an XML-style tag which can be used in the wiki editor like this:
<tagname attribute="some attribute"> some text </tagname>
The attributes and the text between the tags get passed on to a PHP function you implement. This function can then return a HTML string that gets inserted into the output in place of the tags and text. Note that the return string should be HTML, not wiki markup.
At first I started writing a custom MediaWiki extension for Apollo transcripts. The custom tag approach requires an event callback function for every tag we want to show. Adding new tags means adding extra code, and each different transcript would require its own custom code and extension.
XML Transform MediaWiki Extension
A better approach is to write a generic XML/XSL extension. Since XML and XSL are totally generic, different content can be handled simply by changing the XSL. This would only require a single tag, reducing the possibility of namespace collisions. It could also be added to the MediaWiki distribution as a standard extension, requiring no coding on the user's part. MediaWiki is CSS friendly, and adding in-line CSS to the contents part of the page works fine.
For this extension to be of any use, the XML data should come from the MediaWiki page and not from an external file. Storing the XML data as the page text means all of the MediaWiki editing and revision tools can be used with it.
Caution: Breaking the style sheet will break all pages that use it - and not necessarily with a useful error message.
The XSL file needs to be external to the page, but shouldn't be a hard-coded absolute file path. It seems reasonable to store the XSL as an uploaded file, but MediaWiki restricts the types of files that can be uploaded:
To enable uploads of XML, XSD, and XSL files, I added the following to LocalSettings.php:
# Allow upload of additional file types
Once you upload the style sheet, it has its own page, Image:AS17FlightTranscript.xsl. Note that MediaWiki uses the term "image" for any uploaded file, regardless of type. Anyway, the XSL file now has the usual permissions and revision history managed by the framework.
After another unreasonable amount of effort (see: Loading a MediaWiki "image" from a function), I managed to locate and load the style sheet "image" and get its absolute path on the server's file system:
I also tried loading a MediaWiki page from a function, buy I got the "image" upload approach working first.
Got XML DOM?
How about running an external process? PHP theoretically supports COM, .NET, and shell-based Program Execution functions. Of course, all these require the function to be uploaded and installed on the server with appropriate rights.
Got XML DOM COM ?
My server already has the basic Microsoft COM components installed since the site is hosted on a Windows server. So, this snippet should work:
$XmlDoc = new COM("MSXML2.DOMDocument");
Eventually I did get it to work: see XML DOM COM in MediaWiki.
Some of the things I wish I knew before I started:
XmlTransform Extension and Syntax
<XmlTransform xmlfile="filepath.xml" xslname="imagename.xsl" xslfile="filepath.xsl">
The XML data is obtained from the text between the <XmlTransform> and </XmlTransform> tags, unless the xmlfile attribute is specified, in which case the XML is loaded from an absolute path on the server's file system.
The XSL data is obtained from an uploaded "image" file if the xslname attribute is specified, or from an absolute path on the server's file system if the xslfile attribute is specified.
The output transform starts with a standard XML tag:
<!-- start content -->
Even though the input is UTF-8, the output is UTF-16 because COM components use BSTRs, which are 16 bits.
The MediaWiki parser seems to have problems with the opening <?xml ?> tag. It usually inserts a paragraph start (<p>), causing it some confusion with the first tag in the content. This can be a real problem, since the next tag is usually the section title which is supposed to be treated as a header. I finally added some code to delete the <?xml ?> tag:
# Delete the
XmlTransform = MediaWiki + PHP + XSL "Image" + XML DOM COM
Here's the XmlTransform extension that ***finally*** works to produce the Apollo 17 Flight Journal:
This code only works on Windows servers.
The renderXmlTransform() function wraps the input text with the mandatory xml header <?xml version="1.0" encoding="utf-8" ?>, the opening tag <XmlTransformInput>, and the closing tag </XmlTransformInput> so they don't need to be manually added to every page.