Eric Hartwell's InfoDabble

 
Welcome to Eric Hartwell's InfoDabble
About | Site Map
Home Tech Notes Apollo 17: Blue Marble Apollo 17 Flight Journal   Calendars About me  

MediaWiki Programming Side Trips: Potholes, Detours, and Dead Ends

By Eric Hartwell  - last updated April 26, 2006

Potholes, Detours, and Dead Ends

I've always believed it's as important to keep a record of what doesn't work as what does. This section documents some of the obstacles opportunities I encountered along the way.


XML DOM COM in MediaWiki

MediaWiki doesn't have a built-in XML/XSL processor, and neither does PHP 4 which is installed on my hosted server.

How about running an external process? PHP theoretically supports COM, .NET, and shell-based Program Execution functions. Of course, all these require the function to be installed on the server with appropriate rights. My site is hosted on a Windows server, so it already has to have the basic Microsoft COM components installed.

After way more munging around than seems humanly reasonable, I finally got a PHP XSL script to work on the server:

This code only works on Windows servers.

<?php
    $XmlDoc
= new COM("MSXML2.DOMDocument");
    $XslDoc = new COM("MSXML2.DOMDocument");
    $XmlDoc->async = false;
    $XslDoc->async = false;

    # The DOMDocument component needs a server-relative path !!!
    $XmlDoc->load("C:/Domains/ehartwell.com/wwwroot/testxsl/AS17Flight.xml");
    $XslDoc->load("C:/Domains/ehartwell.com/wwwroot/wiki/testxsl/AS17Flight.xsl");

    echo $XmlDoc->transformNode($XslDoc);
    $XmlDoc = NULL;
    $XslDoc = NULL;
?>

To verify that the XSL transform will work from inside MediaWiki, I built a test extension script:

This code only works on Windows servers.

<?php

# ApolloTranscript WikiMedia extension <ApolloTranscript> some text </ApolloTranscript>
# The function registered by the extension gets the text between the tags as input and can transform it into arbitrary HTML code.
# Note: The output is not interpreted as WikiText but directly included in the HTML output. So Wiki markup is not supported.
# To activate the extension, include it from your LocalSettings.php with: include("extensions/ApolloTranscript.php");

$wgExtensionFunctions[] = "wfApolloTranscript";

function
wfApolloTranscript() {
    global
$wgParser;
    
# Register the extension with the WikiText parser.
    # The first parameter is the name of the new tag. In this case it defines the tag <ApolloTranscript> ... </ApolloTranscript>
    # The second parameter is the callback function for processing the text between the tags
    
$wgParser->setHook( "ApolloTranscript", "renderApolloTranscript" );
}

# The callback function for converting the input text to HTML output
function renderApolloTranscript( $input, $argv ) {
    
# $argv is an array containing any arguments passed to the extension like <example argument="foo" bar>..
    
    
$XmlDoc = new COM("MSXML2.DOMDocument");
    
$XslDoc = new COM("MSXML2.DOMDocument");
    
$XmlDoc->async = false;
    
$XslDoc->async = false;
    
    
# The DOMDocument component needs a server-relative path !!!
    
$XmlDoc->load("C:/Domains/ehartwell.com/wwwroot/wiki/extensions/AS17Flight.xml");
    
$XslDoc->load("C:/Domains/ehartwell.com/wwwroot/wiki/extensions/AS17Flight.xsl");
    
    
$output .= $XmlDoc->transformNode($XslDoc
);
    
$XmlDoc = NULL;
    
$XslDoc = NULL;
    
    return
$output;
}

?>

Some of the things I wish I knew before I started:

  • The DOMDocument component needs a server-relative absolute path for the file names, not url or relative path.
  • XmlDoc->text and XmlDoc->xml sometimes appear to be empty, even when they're not.
  • XSLT drops spaces between tags, but does not support &nbsp; Many sources recommend using &#160; as a replacement, but it turns out the HTML standard does not support ASCII characters higher than #FF. MediaWiki outputs the  &#160; tags as ?, which can be pretty misleading until you puzzle it out.
  • The MSXML parser is much stricter than the one in Internet Explorer; IE will display an XSL transform even when the XML and/or XSL aren't perfectly well-formed.

Loading a MediaWiki "image" from a function

MediaWiki uses the term "image" for any uploaded file. MediaWiki's internal workings are not well documented, but, of course, the source is available.

The image processing is handled by the Image object defined in the file includes/Image.php. According to the source, you can load an image using Image::newFromTitle( $title );

If $wgUseSharedUploads is set, the wiki will look in the shared repository. If no file of the given name is found in the local repository (for [[Image:..]], [[Media:..]] links). Thumbnails will also be looked for and generated in this directory.

I also found a promising code snippet in includes/ExternalEdit.php:

# ExternalEdit.php

if ($this->mMode=="file") {
   
$type="Edit file";
    
$image = Image::newFromTitle( $this->mTitle );
    
$img_url = $image->getURL();
    if (
strpos($img_url,"://")) {
        
$url = $img_url;
    } else {
         
$url = $wgServer . $img_url;
    }
    
$extension=substr($name, $pos);

In wikimarkup, the standard image title format is [[Image:Name.jpg]]. What's a valid title string for Image::newFromTitle( )? A quick series of experiments gives an answer:

  • $image = Image::newFromTitle( "[[Image:AS17FlightTranscript.xsl]]" );   Result: Image constructor given bogus title.
  • $image = Image::newFromTitle( "Image:AS17FlightTranscript.xsl" );   Result: Image constructor given bogus title.
  • $image = Image::newFromTitle( "AS17FlightTranscript.xsl" );   Result: Image constructor given bogus title.

Hmmm... totally bogus, dude. Another look at the source finds the very similar function Image::newFromName( ). Back for another experiment:

  • $image = Image::newFromName( "AS17FlightTranscript.xsl" );   Result: works!

For the XSL transform we need either the text contents of the file or its local path on the server's file system.

## Return the image path of the image in the local file system as an absolute path
function getImagePath() {
    
$this->load();
    return
$this->imagePath;
}

This looks promising; in fact, it does return an absolute path in the format that MSXML2.DOMDocument needs.

Note the $this->load(); function call. This actually locates the "image" file in MediaWiki's cache, database, or file system and creates a temporary file on the server.


Loading a MediaWiki page from a function

It might be handy to store the XSL as its own Wiki page. Internally, WikiMedia uses Class: Article to load an article:

$article = new Article($result->mTitle);
$text = $article->fetchContent(0, true, false);

or, from the actual code in Parser.php line 2271:

$article = new Article( $title );
$articleContent = $article->getContentWithoutUsingSoManyDamnGlobals();
if (
$articleContent !== false ) {
    
$found = true;
    
$text = $articleContent;
    
$replaceHeadings = true;
}

I tried loading a predefined page:

$title = "Main_Page";
$article = new Article( $title );
$output .= "Article object has title '" . $article->getTitle() . "'<br />";

This gives an error: Fatal error: Call to a member function on a non-object in Article.php on line 371. Could it be that there's a conflict between MediaWiki globals?

Anyway, the uploaded "image" file approach works, so I decided to shelve this approach.


MediaWiki Headings vs XSLT Headings

The XML transcript source is organized with <section>, and <subsection> tags.

Originally, I used simple inline tags of the form:

<section>Countdown</section>

and used XSL to translate it into an HTML header:

<xsl:template match="section">
<h2><xsl:value-of select="node()" /></h2>
</xsl:template>

While this is good HTML, MediaWiki (usually) doesn't recognize that the headers should be used to build a table of contents. I also tried using wikimarkup tags:

<xsl:template match="section">
==<xsl:value-of select="node()" />==
</xsl:template>

but, it doesn't work. RTFM: as the documentation states, "Note that the return string should be HTML, not wiki markup."

I finally decided to use MediaWiki headings outside the XmlTransform

It makes much more sense to use the <section> tags as XML content wrappers:

<section title="Countdown" seq="1.1">
    assorted transcript content
</section>

and using XSL to output the section contents only:

<xsl:template match="section">
    <xsl:apply-templates />
</xsl:template>

Making the section title an attribute instead of a value keeps it available for other XSL processing, while separating it from the MediaWiki markup. I also added a seq tag which can be used for section numbering, or just for keeping the sections in order in applications that sort by title text.

Other possibilities:

  • If we assign each transcript to its own Category, and use the time as part of each item's title, then MediaWiki will automatically build a table of contents. Unfortunately, the categories are in alphabetical order, organized by first letter.
  • See also Series of articles.
Creative Commons License

Unless otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License

 

Site Map | About Me