Tā kā mēs apskatījām visas satāvdaļas, kas nepieciešamas funkcijai extracttext, apskatīsim, kā tas izskatās kopumā:

function nodeToHtml( $node_content )
{
  //Pārbaudām vai šim nodam ir jādod bold formatēšana
  preg_match( "'\<w:b\>\<\/w:b\>'", $node_content, $bold );
  //Pārbaudām vai šim nodam ir jādod italic formatēšana
  preg_match( "'\<w:i\>\<\/w:i\>'", $node_content, $italic );
  //Iegūstam fonta izmēru
  preg_match( "'\\<\/w:sz\>'", $node_content, $font_size );
  //Iegūstam teksu
  preg_match( "'\<w:t(.*?)\>(.*?)\<\/w:t\>'", $node_content, $text );
  //Fonta nosaukumu nemēģinām iegūt, jo droši vien vēlēsimies izmantot savu
  $tag_name = 'span';
  $style = ' style="';
  $content = $text[ 2 ];
  if( count( $bold ) > 0 )
    $style .= 'font-weight: bold; ';
  if( count( $italic ) > 0 )
    $style .= 'font-style: italic; ';
  if( count( $font_size ) > 0 )
    $style .= 'font-size: ' . $font_size[1] . 'px ';
  $style .= '" ';
  return '<' . $tag_name . $style . '>' . $content . '';
}
function extracttext($filename)
{
  $ext = explode('.', $filename);
  $ext = $ext[count(explode('.', $filename)) - 1];
  if($ext == 'docx')
    $dataFile = "word/document.xml";
  else
    $dataFile = "content.xml";
  $zip = new ZipArchive;
  if (true === $zip->open($filename))
  {
    if (($index = $zip->locateName($dataFile)) !== false)
    {
      $text = $zip->getFromIndex($index);
      $xml = new DOMDocument();
      $xml->loadXML($text);
      $ret = $xml->saveHTML();
      $ret = str_replace("", "
", $ret);
      preg_match_all( "'<w:r(.*?)\<\/w:r\>'", $ret, $get, PREG_OFFSET_CAPTURE);
      foreach ( $get[0] as $node_key => $node )
        $ret = str_replace($node[0], nodeToHtml($node[0]), $ret);
      preg_match_all( "'(.*?)\
'", $ret, $p);
      $data = array();
      foreach ($p[0] as $key => $paragraph)
      {
        $data[ $key ] = '';
        preg_match_all( "'\<span(.*?)\>(.*?)\<\/span\>'", $paragraph, $spans );
        foreach ($spans[0] as $span)
          $data[ $key ] .= $span;
      }
      return $data;
    }
    $zip->close();
  }
  return 'File not found';
}

Bet nezin kāpēc, bet word tā dara, bieži sanāk ka mums vārdi(dažreiz pat burti) ir atsevišķos span tegos. Tāpēc, pirms atgriezt paragrāfu masīvu mēs varam tajā iztīrīt liekos tegus:

foreach ( $data as $key => $p )
{
  $data[$key] ='';
  preg_match_all( "'\<span(.*?)\>(.*?)\<\/span\>'", $p, $spans );
  $curr_format='';
  $curr_text='';
  foreach ( $spans[ 1 ] as $span_key => $span_format )
  {
    if( $span_format == $curr_format )
    {
      $curr_text .= $spans[2][$span_key];
    }
    else
    {
      if( $curr_format != '' AND $curr_text != '' )
        $data[$key] .= '<span '.$curr_format.'>'.$curr_text.'';
      $curr_text = $spans[2][$span_key];
      $curr_format = $span_format;
    }
  }
  $data[$key] .= '<span '.$curr_format.'>'.$curr_text.'';
}