Quick and Dirty Content Filtering with PHP
The PHP language includes lots of helpful functions for easily filtering, cleaning and manipulating content, all of which are excellent tools in the hands of a skilled developer. A solid knowledge of these filtering tools will help you achieve enhanced security and functionality in your projects.
Today, I’m going to give you a crash course on PHP’s basic filtering functions so that by the end of the tutorial you’ll be able to easily escape data, strip tags, remove words and more.
Escaping Strings
First up is string escaping, implemented with what is probably the most basic of PHP’s filtering functions – addslashes(). This function escapes single quotes, double quotes and backslashes for you, allowing you to (more) safely accept form data, etc. Say for example you have an input field (named ‘title’)and someone types "Suzie's Blog". Those double and single quotes can cause some problems, but not for long:
1.$title=addslashes($_POST['title']);2.//$title is now safe to use!3.4.echo$title;5.//outputs \"SuzieAs you might guess, addslashes() has an inverse function: stripslashes(). On a side note, in case you ever find yourself developing a custom WordPress plugin, stripslashes() is incredibly useful for removing the slashes that WordPress adds to saved options values.
So all this is pretty handy, but for MySQL queries it’s smart to use something a bit more powerful. Up next-
Escaping MySQL Queries
MySQL injection attacks are a very real concern, making data sanitation a must for any web developer. Thankfully, mysql_real_escape_string() provides a way to easily and safely escape dangerous characters from a MySQL query before executing it. This is perhaps the most often used PHP sanitation function. Here’s an example:
1.$title=$_POST['title'];2.//$title could be anything, including an injection3.4.$title= mysql_real_escape_string($title);5.//It's now safe:6.mysql_query('INSERT INTO blogs(title) VALUES($title)');This function is one that anyone working with PHP and MySQL will use quite often – it’s elegant and potent (it even works on binary data).
Encoding HTML Entities
Htmlentities() is another fun and useful function. It will take automatically encode character entities like < (&) and “ ("). It's most useful for taking non-malicious user input that simply has special characters in it and formatting them for display. Here’s how you might use it, supposing someone submitted a title called Me & My Dog, "Buddyquot; > An Essay:
1.$title=$_POST['title'];2.3.$title= htmlentities($title);4.//encode the string5.6.echo$title;7.//outputs a correctly encoded titleThis function isn’t designed to be a security filter (for filtering malicious data), it’s simply a convenient way to make sure user data is encoded correctly. It also has an inverse function, html_entity_decode().
Stripping Tags
Sometimes you don’t want to just encode html tags, you want to strip them out completely. PHP’s strip_tags() is the perfect solution, doing just what the function name implies. Say someone sends in malicious data:
1.$title=$_POST['title'];2.//$title's value = "Happy <script src="http://evilsite.com/hack.js" ></script> Birthday!"3.4.$title=strip_tags($title);5.//remove dangerous tags6.7.echo$title;8.//outputs "Happy Birthday"That’s it – all tags are removed just like that. A useful function indeed. But what about if you want to strip some tags (like script, img) but leave some (strong, a, p). Read on!
Advanced Data Filtering
These functions that we’ve just been through will work the majority of the time, but there will be situations where they aren’t quite versatile or powerful enough. Thanfully, we have regular expressions. Using some regexp patterns and the powerful PHP function preg_replace(), we can filter, strip, replace, or remove pretty much anything we want without much trouble at all. Believe me, this thing is powerful.
You can check out more about preg_replace() here, but the basic idea is that it accepts two arguments – what to look for (called a needle) and what to look in (called a haystack). The needle and haystack can be strings or arrays (if you have multiple phrases/words/patterns to search for).
Here’s an example of how you’d set up preg_replace to strip all script tags and leave other tags:
1.$dangerous_content="Hello, <script type='text/javascript'>alert('hacked!')</script> how are you?"2.//this is the malicious content we need to sanitize3.4.$script_tags="/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/"/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/lt;script +(.+)<\/script>+/i""/[[posterous_whitelist_block_97]]lt;script +(.+)<\/script>+/i"lt;script +(.+)<\/script>+/i";5.//match anything between opening and closing script tags6.7.$fixed_content= preg_replace($script_tags,'',$dangerous_content);8.//malicious scripts have now been removed!You could also set it up to strip out a series of forbidden words (profanity, spam words, etc.) like this:
1.$forbidden=array('forbidden1','forbidden2','forbiddenN');2.//these words are the ones that will be stripped out3.4.$fixed_content= preg_replace($forbidden,'',$_POST['comment_text']);5.//goodbye, forbidden wordsAs you can see, it’s actually surprisingly easy to manipulate data with PHP and prepare it for use. Nothing stands in your way!
Find Out More
Before you go, here are some more great tutorials on PHP filtering, validation and sanitation:






