Eric Hartwell's InfoDabble

 
Welcome to Eric Hartwell's InfoDabble
About | Site Map
Home Tech Notes Apollo 17: Blue Marble Apollo 17 Flight Journal   Calendars About me  

MediaWiki Extension Parser Debugging

By Eric Hartwell - latest revision April 26, 2006

According to the MediaWiki extension documents, you should be able to renter wikitext within your extension by calling $wgOut->parse(). This works just fine for inline extensions if they're all on the same page.

Things fall apart when an extension is used within a template. The parser is confused for all other extensions on the page, and even for some built-in tags like <pre>. The output text has strings like "UNIQ10842e596cbb71da-PageTransform2a0abb1e6fa1476500000001" where the tag's output should be.

Here's the wikitext for my test case:

This is a test page for troubleshooting the extension reparse problem (see [[:Testing wiki extension reparse]]).

Invoke <code>TestExtension</code> inline: <TestExtension>== This is a reparse test ==</TestExtension>
Invoke <code>TestExtension</code> inline through the <code><span>{</span>{:TestExtensionTemplate}<span>}</span></code> template: {{:TestExtensionTemplate}}

The wikitext <code>&lt;pre&gt;test&lt;/pre&gt;</code> displays as this: <pre>test</pre>

-----

== Testing ==
# Same page with changes to <code>TestExtension</code> code:
#* Call <code>$wgOut->parse()</code>:<br />TestExtension: UNIQ601176b538715c69-TestExtension715301547906bcc500000001<br /><code>&lt;pre&gt;test&lt;/pre&gt;</code>: UNIQ601176b538715c69-pre456b93972d2a0c400000001
#* Don't call <code>$wgOut->parse()</code>: processes as expected

{{:TestExtensionTemplate}}

TestExtensionTemplate is a page that simple contains a TestExtension tag.

<TestExtension>== This is a reparse test ==</TestExtension>

TestExtension itself is a trivial extension that returns the input, either unchanged or after a call to  $wgOut->parse():

<?php

$wgExtensionFunctions
[] = "wfTestExtension";

function
wfTestExtension() {
    global
$wgParser;
    
$wgParser->setHook( "TestExtension", "renderTestExtension" );
}

function
renderTestExtension( $input, $argv ) {
    
## Render wikitext in the output
    
global $wgOut;
#  return $input;
    
return $wgOut->parse($input);
}

?>

If the parser evaluates the first {{:TestExtensionTemplate}} template reference, which invokes TestExtension.

  • return $input; - output is as expected
  • return $wgOut->parse($input); - even the <pre> tag no longer works! The wikitext <pre>test</pre> displays as  UNIQ446190381dab8dcf-pre7765f227b837f4800000001 instead of:
    test

How MediaWiki processes extensions

Note: The comment for the parser->parse() function definition in Parser.php clearly states, "Do not call this function recursively."

The parser first calls parser->extractTagsAndParams() to replace all occurrences of <$tag>content</$tag> in the text with a random marker, and loads the $ext_content associative array with data of the form $unique_marker =>content. Then for each tag it calls the extension's function to transform the text and put the result back in $ext_content.

# Extensions
foreach ( $this->mTagHooks as $tag => $callback ) {
    
$ext_content[$tag] = array();
    
$text Parser::extractTagsAndParams$tag$text$ext_content[$tag],
                         
$ext_tags[$tag], $ext_params[$tag], $uniq_prefix );
    foreach( 
$ext_content[$tag] as $marker => $content ) {
        
$full_tag $ext_tags[$tag][$marker];
        
$params $ext_params[$tag][$marker];
        if ( 
$render )
        
$ext_content[$tag][$marker] = call_user_func_array
                         
$callback, array( $content$params, &$this ) );
        else {
            if ( 
is_null$content ) ) {
                
// Empty element tag
                
$ext_content[$tag][$marker] = $full_tag;
                } else {
                
$ext_content[$tag][$marker] = "$full_tag$content</$tag>";
            }
        }
    }
}

parser->extractTagsAndParams()
Replaces all occurrences of <$tag>content</$tag> in the text with a random marker and returns the new text. The output parameter $content will be an associative array filled with data on the form $unique_marker => content.

/**
* Replaces all occurrences of <$tag>content</$tag> in the text
* with a random marker and returns the new text. the output parameter
* $content will be an associative array filled with data on the form
* $unique_marker => content.
*
* If $content is already set, the additional entries will be appended
* If $tag is set to STRIP_COMMENTS, the function will extract
* <!-- HTML comments -->
*
* @access private
* @static
*/
function extractTagsAndParams($tag, $text, &$content, &$tags, &$params, $uniq_prefix = ''){
    
$rnd = $uniq_prefix . '-' . $tag . Parser::getRandomString();
    if ( !
$content ) {
        
$content = array( );
    }
    
$n = 1;
    
$stripped = '';
    
    if ( !
$tags ) {
        
$tags = array( );
    }
    
    if ( !
$params ) {
        
$params = array( );
    }
    
    if(
$tag == STRIP_COMMENTS ) {
        
$start = '/<!--()()/';
        
$end = '/-->/';
        } else {
        
$start = "/<$tag(\\s+[^\\/>]*|\\s*)(\\/?)>/i";
        
$end = "/<\\/$tag\\s*>/i";
    }
    
    while (
'' != $text ) {
        
$p = preg_split( $start, $text, 2, PREG_SPLIT_DELIM_CAPTURE );
        
$stripped .= $p[0];
        if(
count( $p ) < 4 ) {
            break;
        }
        
$attributes = $p[1];
        
$empty = $p[2];
        
$inside = $p[3];
        
        
$marker = $rnd . sprintf('%08X', $n++);
        
$stripped .= $marker;
        
        
$tags[$marker] = "<$tag$attributes$empty>";
        
$params[$marker] = Sanitizer::decodeTagAttributes( $attributes );
        
        if (
$empty === '/' ) {
            
// Empty element tag, <tag />
            
$content[$marker] = null;
            
$text = $inside;
            } else {
            
$q = preg_split( $end, $inside, 2 );
            
$content[$marker] = $q[0];
            if(
count( $q ) < 2 ) {
                
# No end tag -- let it run out to the end of the text.
                
break;
                } else {
                
$text = $q[1];
            }
        }
    }
    return
$stripped;
}

Later on, the parser substitutes the translated text back into the output. Now, if the "UNIQ..." markers are still in the output text, then either $ext_content has been corrupted, or the replacement never happens...

# Merge state with the pre-existing state, if there is one
if ( $state ) {
    
$state['html'] = $state['html'] + $html_content;
    
$state['nowiki'] = $state['nowiki'] + $nowiki_content;
    
$state['math'] = $state['math'] + $math_content;
    
$state['pre'] = $state['pre'] + $pre_content;
    
$state['gallery'] = $state['gallery'] + $gallery_content;
    
$state['comment'] = $state['comment'] + $comment_content;
    
    foreach(
$ext_content as $tag => $array ) {
        if (
array_key_exists( $tag, $state ) ) {
            
$state[$tag] = $state[$tag] + $array;
        }
    }
    } else {
    
$state = array(
    
'html' => $html_content,
    
'nowiki' => $nowiki_content,
    
'math' => $math_content,
    
'pre' => $pre_content,
    
'gallery' => $gallery_content,
    
'comment' => $comment_content,
    ) +
$ext_content;
}
return
$text;

If there's no existing state, then the $ext_content is simply tacked on to the end of the $state array without applying the translation.

Troubleshooting the Reparse Problem

To track what's actually happening inside the parser, I added some debug code to the test extension:

trigger_error("TestExtension called " . var_export(debug_backtrace()), E_USER_NOTICE);

No reparse Reparse
wikitext input wikitext input
inline: <TestExtension>== This is a reparse test ==</TestExtension>
template: {{:TestExtensionTemplate}}
pre:<pre>test</pre>
template: {{:TestExtensionTemplate}}
inline: <TestExtension>== This is a reparse test ==</TestExtension>
template: {{:TestExtensionTemplate}}
pre:<pre>test</pre>
template: {{:TestExtensionTemplate}}
Output (no reparse) Output (with reparse)
inline == This is a reparse test ==
template == This is a reparse test ==
pre: <pre>test</pre>
template <p>== This is a reparse test ==
</p><p><br />
inline UNIQ2130a2344e48f0fe-TestExtension4ce844657b03daa800000001
template UNIQ76222ba3742437c0-TestExtension5fa4b24836d5619b00000001
pre: UNIQ2130a2344e48f0fe-pre3950620b2071ed0100000001
template <div class="editsection" style="float:right;margin-left:5px;">[<a href="/wiki/index.php?title=Wiki_extension_test_page&amp;action=edit&amp;section=1" title="Edit section: This is a reparse test">edit</a>]</div><a name="This_is_a_reparse_test"></a><h2> This is a reparse test </h2>
<p><br />
1st TestExtension call 8 => OutputPage.php:314 function 'parse' 1st TestExtension call 8 => OutputPage.php: 314 function 'parse'
mStripState:
'pre' => array (
'UNIQ2a01516467c051fe-pre4bbb1bc3b63bbca00000001' => '<pre>test</pre>',
),

'TestExtension' => array (
'UNIQ2a01516467c051fe-TestExtension5ed3df6f54c5b16d00000001' => '== This is a reparse test ==',
),
mStripState:
'pre' => array (
'UNIQ59c1127449650efe-pre5ad453236159ce5500000001' => '<pre>test</pre>',
),

'TestExtension' =>
array (
'UNIQ59c1127449650efe-TestExtension722751597395f68c00000001' => '<div class="editsection" style="float:right;margin-left:5px;">[<a href="/wiki/index.php?title=Wiki_extension_test_page&amp;action=edit&amp;section=1" title="Edit section: This is a reparse test">edit</a>]</div><a name="This_is_a_reparse_test"></a><h2> This is a reparse test </h2>',
),

Comment: The 'pre' array is still populated, and one 'TestExtension' element has been parsed.

2nd TestExtension call  8 => OutputPage.php:314 function 'parse' 2nd TestExtension call 8 => OutputPage.php:314 function 'parse'
mStripState:
'pre' => array ('UNIQ2a01516467c051fe-pre4bbb1bc3b63bbca00000001' => '<pre>test</pre>',
),

'TestExtension' => array (
'UNIQ2a01516467c051fe-TestExtension5ed3df6f54c5b16d00000001' => '== This is a reparse test ==',
'UNIQ2a01516467c051fe-TestExtension26e9562944a1ca4100000001' => '== This is a reparse test ==',
),

Comment: Both the 'pre' and 'TestExtension' arrays are still populated.

mStripState:
'pre' => array ( ),

 'TestExtension' => array (
'UNIQ6676470e18af4763-TestExtension40a20b9465a85f1500000001' => '<div class="editsection" style="float:right;margin-left:5px;">[<a href="/wiki/index.php?title=Wiki_extension_test_page&amp;action=edit&amp;section=1" title="Edit section: This is a reparse test">edit</a>]</div><a name="This_is_a_reparse_test"></a><h2> This is a reparse test </h2>',
),

Comment: The 'pre' array is empty, and 'TestExtension' has only one element. The UNIQ codes are left in the output, which suggests they must have been deleted from the state during the first render call.

To take a closer look, do another dump after the first 'TestExtension' calls $wgOut->parse() but before it returns.

First extension call, before $wgOut->parse():
mStripState:
'pre' => array ( 'UNIQ51cab28f3b372394-pre28907c5829a2abb400000001' => '<pre>test</pre>', ),
'TestExtension' => array ( 'UNIQ51cab28f3b372394-TestExtension70e0355c37d4627d00000001' => NULL, ),

First extension call, after $wgOut->parse() but before return:
mStripState:
'pre' => array ( ),
'TestExtension' => array ( ),

So, when the extension is called while processing a template expansion, the $wgOut->parse() function clears mOutput's mStripState array. It shouldn't.

Now, take a close look at the source for the parser:

/**
* Convert wikitext to HTML
* Do not call this function recursively.
*/
function parse( $text, &$title, $options, $linestart = true, $clearState = true, $revid = null ) {
    
/**
    * First pass--just handle <nowiki> sections, pass the rest off
    * to internalParse() which does all the real work.
    */
    
    
global $wgUseTidy, $wgAlwaysUseTidy, $wgContLang;
    
$fname = 'Parser::parse';
    
wfProfileIn( $fname );
    
    if (
$clearState ) {
        
$this->clearState();
    }
    
    
$this->mOptions = $options;
    
$this->mTitle =& $title;
    
$this->mRevisionId = $revid;
    
$this->mOutputType = OT_HTML;
    
    
$this->mStripState = NULL;
    
    
//$text = $this->strip( $text, $this->mStripState );
    // VOODOO MAGIC FIX! Sometimes the above segfaults in PHP5.
    
$x =& $this->mStripState;
    
    
wfRunHooks( 'ParserBeforeStrip', array( &$this, &$text, &$x ) );
    
$text = $this->strip( $text, $x );
    
wfRunHooks( 'ParserAfterStrip', array( &$this, &$text, &$x ) );
    . . . .
    . . . .

This is where the parser state gets killed. My guess is that the $this->mStripState = NULL statement shouldn't be there; the clearState() function is the place to reinitialize mStripState:

function clearState() {
    . . . .
    
$this->mStripState = array();

What happens if we don't clear mStripState ? Is this mandatory, or is it a leftover? Why is it being cleared in the first place?

To figure this out, I had to study the MediaWiki source code on SourceForge.net. This is the mixed blessing of open source ...

History of $this->mStripState = NULL

The $stripState = NULL; line was added in revision 1.11 (Mar 6, 2004), probably a leftover from "Mov[ing] body of Article::preSaveTransform to Parser.php". The $stripState variable was never used in the main parse() routine.

  • Revision 1.11: Mar 6 2004 (view) Changes since 1.10: +179 -56 lines
  • "In Parser.php, generalised stripping of <nowiki>, <pre> and <math> to allow more general use such as nesting"
  • "Moved body of Article::preSaveTransform to Parser.php"
Revision 1.11 Parser.php line 59
     $stripState = NULL;
     $text = $this->strip( $text, $this->mStripState, true );
     $text = $this->doWikiPass2( $text, $linestart );
     $text = $this->unstrip( $text, $this->mStripState );
     $this->mOutput->setText( $text );
     wfProfileOut( $fname );
     return $this->mOutput;
Revision 1.11 Parser.php line 1345
function preSaveTransform( $text, &$title, &$user, $options, $clearState = true )
{
     $this->mOptions = $options;
     $this->mTitle = $title;
     if ( $clearState ) {
       $this->clearState;
     }
     $stripState = false;
     $text = $this->strip( $text, $stripState, false );
     $text = $this->pstPass2( $text, $user );
     $text = $this->unstrip( $text, $stripState );
     return $text;
}

The $stripState = NULL; line remained in the source until revision 1.4.23 (Apr 21, 2005), when it was changed to $this->mStripState = NULL; as part of the fix for (bug 1931) "cleanup, removing unused code and variables". The bug fix description states, "The $stripState var is not used, ever. I am positive this is supposed to be $this->mStripState."

  • Revision 1.423: Apr 21 2005 - (view) Changes since 1.422: +15 -20 lines
  • (bug 1931) cleanup, removing unused code and variables.
    • "The $stripState var is not used, ever. I am positive this is supposed to be $this->mStripState."
Line 165 Line 165
$this->mTitle =& $title; $this->mTitle =& $title;
$this->mOutputType = OT_HTML; $this->mOutputType = OT_HTML;
$stripState = NULL; $this->mStripState = NULL;
global $fnord; $fnord = 1; global $fnord; $fnord = 1;
//$text = $this->strip( $text, $this->mStripState ); //$text = $this->strip( $text, $this->mStripState );

Alas, that's not the case. It's true that "The $stripState var is not used, ever." But, this line is supposed to be deleted, not changed to $this->mStripState.

Nulling out $this->mStripState causes the parser to reset its state. This is not normally a problem - unless you call $wgOut->parse from a extension function that is referenced from a template, or otherwise recursively.

(Note: Yes, I also wondered about the "VOODOO MAGIC FIX!" - but that turned out to be unrelated.)

Fixing $this->mStripState = NULL

Here's the test page with wgOut->parse() and $this->mStripState = NULL:

I deleted the "$this->mStripState = NULL" line, and here's the test page with wgOut->parse() :

The result:

No change.

D'Oh!

Revision History

  • April 26, 2006 - initial version

Creative Commons License

Unless otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License

 

Site Map | About Me