Cropping HTML the standards way

Tuesday, 26th January 2010 - 03:38 GMT

While writing the blog script for my site, I realised I would need to create a preview of each post. The quick and easy approach would be to use substr() to crop the HTML string to the required preview length. This however has a major flaw - the cropped HTML would likely have tags still open and may even have half opened tags. A quick solution would be to use strip_tags() to remove the HTML tags before cropping the string. This is something I didn't want to do as it would remove the formatting of the post too and so not create a true preview. So, I decided I would write something myself which would:

The script I created is released under the GPLv2 (see bottom of the page for download link) contains a few helper functions and the main html_crop() function. The helper functions (as well as the rest of the script) are documented in the source with PHPDoc but as they're not important I won't document them here again. As the html_crop() is more important and will actually be used, I'll list it's 4 parameters here:

To use the html_crop() function, one may do something simple like …

PHP code:
  1. <?php
  2.  
  3.   require_once('html_crop.php');
  4.  
  5.   print html_crop('<p><a href="http://www.google.com/">Google</a></p>', 4);
  6.  
  7. ?>

… which would produce the following output:

HTML code:
  1. <p><a href="http://www.google.com/">Goo…</a></p>

As you can see, the <p> and <a> tags are closed even though the cropped text ends before the closing tags. This ensures the returned HTML remains valid, assuming the input HTML is valid. Although the requested length was 4, the returned string is obviously longer, but, the only visible text after cropping would be "Goo…" which is the requested 4 in length.

Attached files