StringHelper

Packagecraft.app.helpers
NamespaceCraft
Inheritanceclass StringHelper
Since1.0
Source Codecraft/app/helpers/StringHelper.php

Class StringHelper

Public Methods #

MethodDescriptionDefined By
UUID()StringHelper
arrayToString()Converts an array to a string.StringHelper
asciiString()Converts extended ASCII characters to ASCII.StringHelper
checkForIconv()Returns whether iconv is installed and not buggy.StringHelper
convertToUTF8()Attempts to convert a string to UTF-8 and clean any non-valid UTF-8 characters.StringHelper
encodeMb4()HTML-encodes any 4-byte UTF-8 characters.StringHelper
escapeCommas()Backslash-escapes any commas in a given string.StringHelper
escapeRegexChars()StringHelper
getAsciiCharMap()Returns ASCII character mappings.StringHelper
getAsciiPunctuation()Returns the asciiPunctuation array.StringHelper
getCharAt()Returns the character at a specific point in a potentially multibyte string.StringHelper
getEncoding()Gets the current encoding of the given string.StringHelper
isNotNullOrEmpty()StringHelper
isNullOrEmpty()StringHelper
isUTF8()Checks if the given string is UTF-8 encoded.StringHelper
isUUID()Returns is the given string matches a UUID pattern.StringHelper
lowercaseFirst()Lowercases the first character of a multibyte string.StringHelper
normalizeKeywords()Normalizes search keywords.StringHelper
parseMarkdown()Runs a string through Markdown.StringHelper
parseMarkdownLine()Runs a string through Markdown, but removes any paragraph tags that get removedStringHelper
randomString()StringHelper
splitOnWords()Splits a string into an array of the words in the string.StringHelper
stripHtml()Strips HTML tags out of a given string.StringHelper
toCamelCase()camelCases a string.StringHelper
toKebabCase()kebab-cases a string.StringHelper
toLowerCase()Returns a multibyte aware lower-case version of a string. Note: Not using mb_strtoupper because of {@see https://bugs.php.net/bug.php?id=47742}.StringHelper
toPascalCase()PascalCases a string.StringHelper
toSnakeCase()snake_cases a string.StringHelper
toUpperCase()Returns a multibyte aware upper-case version of a string. Note: Not using mb_strtoupper because of {@see https://bugs.php.net/bug.php?id=47742}.StringHelper
uppercaseFirst()Uppercases the first character of a multibyte string.StringHelper

Method Details #

UUID() #

public static function UUID()
{
     return sprintf('%04x%04x-%04x-%04x-%04x-%04x%04x%04x',

          // 32 bits for "time_low"
          mt_rand(0, 0xffff), mt_rand(0, 0xffff),

          // 16 bits for "time_mid"
          mt_rand(0, 0xffff),

          // 16 bits for "time_hi_and_version", four most significant bits holds version number 4
          mt_rand(0, 0x0fff) | 0x4000,

          // 16 bits, 8 bits for "clk_seq_hi_res", 8 bits for "clk_seq_low", two most significant bits holds zero and
          // one for variant DCE1.1
          mt_rand(0, 0x3fff) | 0x8000,

          // 48 bits for "node"
          mt_rand(0, 0xffff), mt_rand(0, 0xffff), mt_rand(0, 0xffff)
     );
}
Returnsstring

arrayToString() #

public static function arrayToString($arr, $glue = ',')
{
     if (is_array($arr) || $arr instanceof \IteratorAggregate)
     {
          $stringValues = array();

          foreach ($arr as $value)
          {
               $stringValues[] = static::arrayToString($value, $glue);
          }

          return implode($glue, $stringValues);
     }
     else
     {
          return (string) $arr;
     }
}
$arrmixed
$gluestring
Returnsstring

Converts an array to a string.

asciiString() #

public static function asciiString($str)
{
     $asciiStr = '';
     $strlen = strlen($str);
     $asciiCharMap = static::getAsciiCharMap();

     // Code adapted from http://php.net/ord#109812
     $offset = 0;

     while ($offset < $strlen)
     {
          // ord() doesn't support UTF-8 so we need to do some extra work to determine the ASCII code
          $ascii = ord(substr($str, $offset, 1));

          if ($ascii >= 128) // otherwise 0xxxxxxx
          {
               if ($ascii < 224)
               {
                    $bytesnumber = 2; // 110xxxxx
               }
               else if ($ascii < 240)
               {
                    $bytesnumber = 3; // 1110xxxx
               }
               else if ($ascii < 248)
               {
                    $bytesnumber = 4; // 11110xxx
               }

               $tempAscii = $ascii - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);

               for ($i = 2; $i <= $bytesnumber; $i++)
               {
                    $offset++;
                    $ascii2 = ord(substr($str, $offset, 1)) - 128; // 10xxxxxx
                    $tempAscii = $tempAscii * 64 + $ascii2;
               }

               $ascii = $tempAscii;
          }

          $offset++;

          // Is this an ASCII character?
          if ($ascii >= 32 && $ascii < 128)
          {
               $asciiStr .= chr($ascii);
          }
          // Do we have an ASCII mapping for it?
          else if (isset($asciiCharMap[$ascii]))
          {
               $asciiStr .= $asciiCharMap[$ascii];
          }
     }

     return $asciiStr;
}
$strstring
Returnsstring

Converts extended ASCII characters to ASCII.

checkForIconv() #

public static function checkForIconv()
{
     if (!isset(static::$_iconv))
     {
          // Check if iconv is installed. Note we can't just use HTMLPurifier_Encoder::iconvAvailable() because they
          // don't consider iconv "installed" if it's there but "unusable".
          if (function_exists('iconv') && \HTMLPurifier_Encoder::testIconvTruncateBug() === \HTMLPurifier_Encoder::ICONV_OK)
          {
               static::$_iconv = true;
          }
          else
          {
               static::$_iconv = false;
          }
     }

     return static::$_iconv;
}
Returnsbool

Returns whether iconv is installed and not buggy.

convertToUTF8() #

public static function convertToUTF8($string)
{
     // Don't wrap in a class_exists in case the server already has it's own version of HTMLPurifier and they have
     // open_basedir restrictions
     require_once Craft::getPathOfAlias('system.vendors.htmlpurifier').'/HTMLPurifier.standalone.php';

     // If it's already a UTF8 string, just clean and return it
     if (static::isUTF8($string))
     {
          return \HTMLPurifier_Encoder::cleanUTF8($string);
     }

     // Otherwise set HTMLPurifier to the actual string encoding
     $config = \HTMLPurifier_Config::createDefault();
     $config->set('Core.Encoding', static::getEncoding($string));

     // Clean it
     $string = \HTMLPurifier_Encoder::cleanUTF8($string);

     // Convert it to UTF8 if possible
     if (static::checkForIconv())
     {
          $string = \HTMLPurifier_Encoder::convertToUTF8($string, $config, null);
     }
     else
     {
          $encoding = static::getEncoding($string);
          $string = mb_convert_encoding($string, 'utf-8', $encoding);
     }

     return $string;
}
$string$string
Returnsbool / string

Attempts to convert a string to UTF-8 and clean any non-valid UTF-8 characters.

encodeMb4() #

public static function encodeMb4($string)
{
     // Does this string have any 4+ byte Unicode chars?
     if (max(array_map('ord', str_split($string))) >= 240)
     {
          $string = preg_replace_callback('/./u', function(array $match)
          {
               if (strlen($match[0]) >= 4)
               {
                    // (Logic pulled from WP's wp_encode_emoji() function)
                    // UTF-32's hex encoding is the same as HTML's hex encoding.
                    // So, by converting from UTF-8 to UTF-32, we magically
                    // get the correct hex encoding.
                    $unpacked = unpack('H*', mb_convert_encoding($match[0], 'UTF-32', 'UTF-8'));
                    return isset($unpacked[1]) ? '&#x'.ltrim($unpacked[1], '0').';' : '';
               }
               else
               {
                    return $match[0];
               }
          }, $string);
     }

     return $string;
}
$stringstringThe string
Returnsstring

The string with converted 4-byte UTF-8 characters

HTML-encodes any 4-byte UTF-8 characters.

escapeCommas() #

public static function escapeCommas($str)
{
     return preg_replace('/(?<!\\\),/', '\,', $str);
}
$str$strThe string.
Returnsstring

Backslash-escapes any commas in a given string.

escapeRegexChars() #

public static function escapeRegexChars($string)
{
     $charsToEscape = str_split("\\/^$.,{}[]()|<>:*+-=");
     $escapedChars = array();

     foreach ($charsToEscape as $char)
     {
          $escapedChars[] = "\\".$char;
     }

     return  str_replace($charsToEscape, $escapedChars, $string);
}
$string$string
Returnsmixed

getAsciiCharMap() #

public static function getAsciiCharMap()
{
     if (!isset(static::$_asciiCharMap))
     {
          static::$_asciiCharMap = array(
               216 => 'O',  223 => 'ss', 224 => 'a',  225 => 'a',  226 => 'a',
               229 => 'a',  227 => 'ae', 230 => 'ae', 228 => 'ae', 231 => 'c',
               232 => 'e',  233 => 'e',  234 => 'e',  235 => 'e',  236 => 'i',
               237 => 'i',  238 => 'i',  239 => 'i',  241 => 'n',  242 => 'o',
               243 => 'o',  244 => 'o',  245 => 'o',  246 => 'oe', 248 => 'o',
               249 => 'u',  250 => 'u',  251 => 'u',  252 => 'ue', 255 => 'y',
               257 => 'aa', 269 => 'ch', 275 => 'ee', 291 => 'gj', 299 => 'ii',
               311 => 'kj', 316 => 'lj', 326 => 'nj', 353 => 'sh', 363 => 'uu',
               382 => 'zh', 256 => 'aa', 268 => 'ch', 274 => 'ee', 290 => 'gj',
               298 => 'ii', 310 => 'kj', 315 => 'lj', 325 => 'nj', 337 => 'o',
               352 => 'sh', 362 => 'uu', 369 => 'u',  381 => 'zh', 260 => 'A',
               261 => 'a',  262 => 'C',  263 => 'c',  280 => 'E',  281 => 'e',
               321 => 'L',  322 => 'l',  323 => 'N',  324 => 'n',  211 => 'O',
               346 => 'S',  347 => 's',  377 => 'Z',  378 => 'z',  379 => 'Z',
               380 => 'z',  388 => 'z',
          );

          foreach (craft()->config->get('customAsciiCharMappings') as $ascii => $char)
          {
               static::$_asciiCharMap[$ascii] = $char;
          }
     }

     return static::$_asciiCharMap;
}
Returnsarray

Returns ASCII character mappings.

getAsciiPunctuation() #

public static function getAsciiPunctuation()
{
     if (!isset(static::$_asciiPunctuation))
     {
          static::$_asciiPunctuation =  array(
               33, 34, 35, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 58, 59, 60, 62, 63,
               64, 91, 92, 93, 94, 123, 124, 125, 126, 161, 162, 163, 164, 165, 166,
               167, 168, 169, 170, 171, 172, 174, 175, 176, 177, 178, 179, 180, 181,
               182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 215, 402, 710, 732,
               8211, 8212, 8213, 8216, 8217, 8218, 8220, 8221, 8222, 8224, 8225, 8226,
               8227, 8230, 8240, 8242, 8243, 8249, 8250, 8252, 8254, 8260, 8364, 8482,
               8592, 8593, 8594, 8595, 8596, 8629, 8656, 8657, 8658, 8659, 8660, 8704,
               8706, 8707, 8709, 8711, 8712, 8713, 8715, 8719, 8721, 8722, 8727, 8730,
               8733, 8734, 8736, 8743, 8744, 8745, 8746, 8747, 8756, 8764, 8773, 8776,
               8800, 8801, 8804, 8805, 8834, 8835, 8836, 8838, 8839, 8853, 8855, 8869,
               8901, 8968, 8969, 8970, 8971, 9001, 9002, 9674, 9824, 9827, 9829, 9830
          );
     }

     return static::$_asciiPunctuation;
}
Returnsarray

Returns the asciiPunctuation array.

getCharAt() #

public static function getCharAt($str, $i)
{
     return mb_substr($str, $i, 1);
}
$strstring
$iint
Returnsstring

Returns the character at a specific point in a potentially multibyte string.

getEncoding() #

public static function getEncoding($string)
{
     return StringHelper::toLowerCase(mb_detect_encoding($string, mb_detect_order(), true));
}
$stringstring
Returnsstring

Gets the current encoding of the given string.

isNotNullOrEmpty() #

public static function isNotNullOrEmpty($value)
{
     return !static::isNullOrEmpty($value);
}
$value$value
Returnsbool

isNullOrEmpty() #

public static function isNullOrEmpty($value)
{
     if ($value === null || $value === '')
     {
          return true;
     }

     if (!is_string($value))
     {
          throw new Exception(Craft::t('IsNullOrEmpty requires a string.'));
     }

     return false;
}
$value$value
Returnsbool

isUTF8() #

public static function isUTF8($string)
{
     return static::getEncoding($string) == 'utf-8' ? true : false;
}
$string$stringThe string to check.
Returnsbool

Checks if the given string is UTF-8 encoded.

isUUID() #

public static function isUUID($uuid)
{
     return !empty($uuid) && preg_match("/[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}/uis", $uuid);
}
$uuid$uuid
Returnsbool

Returns is the given string matches a UUID pattern.

lowercaseFirst() #

public static function lowercaseFirst($string)
{
     $strlen = mb_strlen($string, static::UTF8);
     $firstChar = mb_substr($string, 0, 1, static::UTF8);
     $remainder = mb_substr($string, 1, $strlen - 1, static::UTF8);
     return static::toLowerCase($firstChar).$remainder;
}
$stringstringThe multibyte string.
Returnsstring

The string with the first character converted to lowercase.

Lowercases the first character of a multibyte string.

normalizeKeywords() #

public static function normalizeKeywords($str, $ignore = array(), $processCharMap = true)
{
     // Flatten
     if (is_array($str)) $str = static::arrayToString($str, ' ');

     // Get rid of tags
     $str = strip_tags($str);

     // Convert non-breaking spaces entities to regular ones
     $str = str_replace(array('&nbsp;', '&#160;', '&#xa0;') , ' ', $str);

     // Get rid of entities
     $str = preg_replace("/&#?[a-z0-9]{2,8};/i", "", $str);

     // Normalize to lowercase
     $str = StringHelper::toLowerCase($str);

     if ($processCharMap)
     {
          // Remove punctuation and diacritics
          $str = strtr($str, static::_getCharMap());
     }

     // Remove ignore-words?
     if (is_array($ignore) && ! empty($ignore))
     {
          foreach ($ignore as $word)
          {
               $word = preg_quote(static::normalizeKeywords($word));
               $str  = preg_replace("/\b{$word}\b/u", '', $str);
          }
     }

     // Strip out new lines and superfluous spaces
     $str = preg_replace('/[\n\r]+/u', ' ', $str);
     $str = preg_replace('/\s{2,}/u', ' ', $str);

     // Trim white space and return
     return trim($str);
}
$strstringThe dirty keywords.
$ignorearrayIgnore words to strip out.
$processCharMapbool
Returnsstring

The cleansed keywords.

Normalizes search keywords.

parseMarkdown() #

public static function parseMarkdown($str)
{
     if (!class_exists('\Markdown_Parser', false))
     {
          require_once craft()->path->getFrameworkPath().'vendors/markdown/markdown.php';
     }

     $md = new \Markdown_Parser();
     return $md->transform($str);
}
$strstring
Returnsstring

Runs a string through Markdown.

parseMarkdownLine() #

public static function parseMarkdownLine($str)
{
     // Prevent line breaks from getting treated as paragraphs
     $str = preg_replace('/[\r\n]/', '  $0', $str);

     // Parse with Markdown
     $str = self::parseMarkdown($str);

     // Return without the <p> and </p>
     $str = trim(str_replace(array('<p>', '</p>'), '', $str));

     return $str;
}
$strstring
Returnsstring

Runs a string through Markdown, but removes any paragraph tags that get removed

randomString() #

public static function randomString($length = 36, $extendedChars = false)
{
     if ($extendedChars)
     {
          $validChars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890`~!@#$%^&*()-_=+[]\{}|;:\'",./<>?"';
     }
     else
     {
          $validChars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
     }

     $randomString = '';

     // count the number of chars in the valid chars string so we know how many choices we have
     $numValidChars = mb_strlen($validChars);

     // repeat the steps until we've created a string of the right length
     for ($i = 0; $i < $length; $i++)
     {
          // pick a random number from 1 up to the number of valid chars
          $randomPick = mt_rand(1, $numValidChars);

          // take the random character out of the string of valid chars
          $randomChar = $validChars[$randomPick - 1];

          // add the randomly-chosen char onto the end of our string
          $randomString .= $randomChar;
     }

     return $randomString;
}
$lengthint
$extendedCharsbool
Returnsstring

splitOnWords() #

public static function splitOnWords($string)
{
     // Split on anything that is not alphanumeric, or a period, underscore, or hyphen.
     // Reference: http://www.regular-expressions.info/unicode.html
     preg_match_all('/[\p{L}\p{N}\p{M}\._-]+/u', $string, $matches);
     return ArrayHelper::filterEmptyStringsFromArray($matches[0]);
}
$stringstringThe string
Returnsstring[]

Splits a string into an array of the words in the string.

stripHtml() #

public static function stripHtml($str)
{
     return preg_replace('/<(.*?)>/u', '', $str);
}
$strstringThe string.
Returnsstring

Strips HTML tags out of a given string.

toCamelCase() #

public static function toCamelCase($string)
{
     $words = self::_prepStringForCasing($string);

     if (!$words)
     {
          return '';
     }

     $string = array_shift($words).implode('', array_map(array(get_called_class(), 'uppercaseFirst'), $words));

     return $string;
}
$stringstringThe string
Returnsstring

camelCases a string.

toKebabCase() #

public static function toKebabCase($string, $glue = '-', $lower = true, $removePunctuation = true)
{
     $words = self::_prepStringForCasing($string, $lower, $removePunctuation);
     $string = implode($glue, $words);

     return $string;
}
$stringstringThe string
$gluestringThe string used to glue the words together (default is a hyphen)
$lowerboolWhether the string should be lowercased (default is true)
$removePunctuationboolWhether punctuation marks should be removed (default is true)
Returnsstring

kebab-cases a string.

toLowerCase() #

public static function toLowerCase($string)
{
     return mb_convert_case($string, MB_CASE_LOWER, static::UTF8);
}
$stringstring
Returnsstring

Returns a multibyte aware lower-case version of a string. Note: Not using mb_strtoupper because of {@see https://bugs.php.net/bug.php?id=47742}.

toPascalCase() #

public static function toPascalCase($string)
{
     $words = self::_prepStringForCasing($string);
     $string = implode('', array_map(array(get_called_class(), 'uppercaseFirst'), $words));

     return $string;
}
$stringstringThe string
Returnsstring

PascalCases a string.

toSnakeCase() #

public static function toSnakeCase($string)
{
     $words = self::_prepStringForCasing($string);
     $string = implode('_', $words);

     return $string;
}
$stringstringThe string
Returnsstring

snake_cases a string.

toUpperCase() #

public static function toUpperCase($string)
{
     return mb_convert_case($string, MB_CASE_UPPER, static::UTF8);
}
$stringstring
Returnsstring

Returns a multibyte aware upper-case version of a string. Note: Not using mb_strtoupper because of {@see https://bugs.php.net/bug.php?id=47742}.

uppercaseFirst() #

public static function uppercaseFirst($string)
{
     $strlen = mb_strlen($string, static::UTF8);
     $firstChar = mb_substr($string, 0, 1, static::UTF8);
     $remainder = mb_substr($string, 1, $strlen - 1, static::UTF8);
     return static::toUpperCase($firstChar).$remainder;
}
$stringstringThe multibyte string.
Returnsstring

The string with the first character converted to upercase.

Uppercases the first character of a multibyte string.