Friday, September 21, 2012

How to remove all non printable and non UTF8 characters from string

In PHP this is quite simple, but you can spend hours online searching for a solution, especially if you want to keep non US characters.

   $a='some string that you want to clean';
   #remove all non utf8 characters
   $a = mb_convert_encoding($a, 'UTF-8', 'UTF-8');
   # Remove non printable character (i.e. below ascii code 32).
   $a = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F]/u', '', $a);

I hope that I saved someones time (thanks Gregor for help).

