Sorting UTF-8 strings in PHP
Created:29 Apr 2017 16:41:28 , in Web development
Sort UTF-8 strings in PHP.
Sorting the easy way
Set LC_COLLATE value with setlocale:
setlocale(LC_COLLATE,language_teritory.codeset)
As for concrete examples, correct LC_COLLATE value for content written in English language for Great Britain and encoded in UTF-8 is en_GB.UTF-8 . Similarly LC_COLLATE value for content in Polish language and for Poland is pl_PL.UTF-8.
Sorting examples
With collation setting configured, one can start using strcoll function for string comparisons and functions like usort or natsort for sorting strings.
See below:
Case-sensitive sorting of strings in Polish language encoded in UTF-8:
setlocale(LC_COLLATE,'pl_PL.UTF-8');
$PL = array('łyżka','Żeźnia','żebrak','grzegrzółka','Ósemka','2-mięsieczny źrebak');
usort($PL,'strcoll');
=> array('2-mięsieczny źrebak','grzegrzółka','łyżka','Ósemka','żebrak','Żeźnia'
Case-sensitive sorting of strings in German language encoded in UTF-8:
setlocale(LC_COLLATE,'de_DE.UTF-8');
$DE = array('unglück','laßt','schönen','blühe','waschbär','schildkröte');
usort($DE,'strcoll');
=> array('blühe','laßt','schildkröte','schönen','unglück','waschbär')
Using Collator PHP library
As an alternative, collation-aware string comparisons and sorting can be carried out using Collator library. This library is not available unless internationalization PHP extension has been installed, which for Linux distribution like Debian GNU/Linux and many of its derivatives can be achieved with either pecl or preferably apt-get utility.
Logged in as privileged user enter:
apt-get instal php[your-php-version]-intl
Replace [php-version] with whatever version you need internationalization module for. Once installation process is complete, Collator library is available for both php-cli and Apache ( you might need to reload your server configuration first ) and ready to use.
Here is a quick example of a comparison and sorting of some Polish language words with Collator.
$collator = new Collator('pl_PL');
// comparing
$collator -> compare ( 'świerk' , 'sosna' )
=> 1
// sorting
$sortable = array('ściana' ,'słowo','ćwikła','cena');
$collator -> sort ( $sortable );
// new order of words in $sortable
=> array('cena','ćwikła','słowo','ściana')
This post was updated on 06 Oct 2021 21:41:11
Author, Copyright and citation
Author
Author of the this article - Sylwester Wojnowski - is a sWWW web developer. He has been writing computer code for the websites and web applications since 1998.
Copyrights
©Copyright, 2024 Sylwester Wojnowski. This article may not be reproduced or published as a whole or in parts without permission from the author. If you share it, please give author credit and do not remove embedded links.
Computer code, if present in the article, is excluded from the above and licensed under GPLv3.
Citation
Cite this article as:
Wojnowski, Sylwester. "Sorting UTF-8 strings in PHP." From sWWW - Code For The Web . https://swww.com.pl//main/index/sorting-utf-8-strings-in-php
Add Comment