Grouping based on attributes, xslt-2.0

Tag: xslt-2.0 Author: shl886690 Date: 2012-12-03

Is it possible to convert this

<boxed-text>
<para role="Box legend">Box 2?Caption</para>
<para role="Box head">Text Text Text</para>
<para role="Box text">Text Text Text.<sup>1</sup></para>
<para role="Box subhead A">Text Text Text</para>
<para role="Box text">Text Text Text.</para>
<para role="Box subhead A">Text Text Text</para>
<para role="Box text">Text Text Text.</para>
<para role="Box subhead B">Text Text Text</para>
<para role="Box text">Text Text Text.</para>
</boxed-text>

into something like this?

<boxed-text>
<caption><para>Box 2?Caption</para></caption>
<para>Text Text Text</para>
<para>Text Text Text.<sup>1</sup></para>
<sec>
<title>Text Text Text</title>
<para>Text Text Text.</para>
</sec>
<sec>
<title>Text Text Text</title>
<para>Text Text Text.</para>
<sec>
<title>Text Text Text</title>
<para>Text Text Text.</para>
</sec>
</sec>
</boxed-text>

However, subheads may not appear, thus,

<boxed-text>
<para role="Box legend">Box 2?Caption</para>
<para role="Box head">Text Text Text</para>
<para role="Box text">Text Text Text.<sup>1</sup></para>
<para role="Box text">Text Text Text.</para>
<para role="Box text">Text Text Text.</para>
<para role="Box text">Text Text Text.</para>
</boxed-text>

should produce

<boxed-text>
<caption><para>Box 2?Caption</para></caption>
<para>Text Text Text</para>
<para>Text Text Text.<sup>1</sup></para>
<para>Text Text Text.</para>
<para>Text Text Text.</para>
<para>Text Text Text.</para>
</boxed-text>

I am having a hard time getting this done with xsl:for-each-group. An answer will be much appreciated. Thanks in advance!

Best Answer

Here is my edited suggestion, incorporating a slight change to make sure we process all para elements when there is nothing to group:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="xs mf"
  version="2.0">

<xsl:param name="group-role" select="'Box subhead '"/>

<xsl:output indent="yes"/>

<xsl:function name="mf:group" as="element()*">
  <xsl:param name="paras" as="element(para)*"/>
  <xsl:param name="head" as="xs:string"/>
  <xsl:for-each-group select="$paras" group-starting-with="para[@role = concat($group-role, $head)]">
    <xsl:choose>
      <xsl:when test="self::para[@role = concat($group-role, $head)]">
        <sec>
          <xsl:apply-templates select="."/>
          <xsl:sequence select="mf:group(current-group() except ., codepoints-to-string(string-to-codepoints($head)[1] + 1))"/>
        </sec>
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates select="current-group()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each-group>
</xsl:function>

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="boxed-text">
  <xsl:copy>
    <xsl:variable name="first-sh" select="para[@role = concat($group-role, 'A')][1]"/>
    <xsl:apply-templates select="if ($first-sh) then $first-sh/preceding-sibling::para else para"/>
    <xsl:sequence select="mf:group(($first-sh, $first-sh/following-sibling::para), 'A')"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="para[matches(@role, concat($group-role, '[A-Z]'))]">
  <title>
    <xsl:apply-templates/>
  </title>
</xsl:template>

<xsl:template match="para[@role = 'Box legend']">
  <caption>
    <para>
      <xsl:apply-templates/>
    </para>
  </caption>
</xsl:template>

<xsl:template match="para[@role = ('Box head', 'Box text')]">
  <para>
    <xsl:apply-templates/>
  </para>
</xsl:template>

</xsl:stylesheet>

When applied to the input

<boxed-text>
  <para role="Box legend">Box 2?Caption</para>
  <para role="Box head">Text Text Text</para>
  <para role="Box text">Text Text Text.<sup>1</sup></para>
  <para role="Box subhead A">Text Text Text</para>
  <para role="Box text">Text Text Text.</para>
  <para role="Box subhead A">Text Text Text</para>
  <para role="Box text">Text Text Text.</para>
  <para role="Box subhead B">Text Text Text</para>
  <para role="Box text">Text Text Text.</para>
</boxed-text>

with Saxon 9.4 I get the result

<boxed-text>
   <caption>
      <para>Box 2?Caption</para>
   </caption>
   <para>Text Text Text</para>
   <para>Text Text Text.<sup>1</sup>
   </para>
   <sec>
      <title>Text Text Text</title>
      <para>Text Text Text.</para>
   </sec>
   <sec>
      <title>Text Text Text</title>
      <para>Text Text Text.</para>
      <sec>
         <title>Text Text Text</title>
         <para>Text Text Text.</para>
      </sec>
   </sec>
</boxed-text>

When applied to the input

<boxed-text>
<para role="Box legend">Box 2?Caption</para>
<para role="Box head">Text Text Text</para>
<para role="Box text">Text Text Text.<sup>1</sup></para>
<para role="Box text">Text Text Text.</para>
<para role="Box text">Text Text Text.</para>
<para role="Box text">Text Text Text.</para>
</boxed-text>

I get the result

<boxed-text>
   <caption>
      <para>Box 2 Caption</para>
   </caption>
   <para>Text Text Text</para>
   <para>Text Text Text.<sup>1</sup>
   </para>
   <para>Text Text Text.</para>
   <para>Text Text Text.</para>
   <para>Text Text Text.</para>
</boxed-text>

comments:

Wow. A lot to digest but this does the job. You saved me from a lot of headache. Thanks a lot!!!
Hi @MartinHonnen, I forgot to mention that "subheads" may not exist in <boxed-text>. It may appear as <boxed-text> <para role="Box legend">Box 2?Caption</para> <para role="Box head">Text Text Text</para> <para role="Box text">Text Text Text.<sup>1</sup></para> <para role="Box text">Text Text Text.</para> <para role="Box text">Text Text Text.</para> <para role="Box text">Text Text Text.</para> </boxed-text> Could you modify your code above? Thanks again!
And what do you want to do in that case? It's hard to read code samples in comments, consider to edit your question and show all relevant input code samples and the needed transformation results there, then I am sure we can fix the stylesheet.
A newbie error, sorry for that. The question has been modified.
I edited the stylesheet code, it should now do the job for both types of input documents.