Sorting UTF-8 strings in PHP

Sorting UTF-8 strings in PHP

Created:29 Apr 2017 16:41:28 , in  Web development

Sort UTF-8 strings in PHP.

Sorting the easy way

Set LC_COLLATE value with setlocale:


setlocale(LC_COLLATE,language_teritory.codeset)

As for concrete examples, correct LC_COLLATE value for content written in English language for Great Britain and encoded in UTF-8 is en_GB.UTF-8 . Similarly LC_COLLATE value for content in Polish language and for Poland is pl_PL.UTF-8.

Sorting examples

With collation setting configured, one can start using strcoll function for string comparisons and functions like usort or natsort for sorting strings.

See below:

Case-sensitive sorting of strings in Polish language encoded in UTF-8:


setlocale(LC_COLLATE,'pl_PL.UTF-8'); 
$PL = array('łyżka','Żeźnia','żebrak','grzegrzółka','Ósemka','2-mięsieczny źrebak');
usort($PL,'strcoll');  
=> array('2-mięsieczny źrebak','grzegrzółka','łyżka','Ósemka','żebrak','Żeźnia'

Case-sensitive sorting of strings in German language encoded in UTF-8:


setlocale(LC_COLLATE,'de_DE.UTF-8');
$DE = array('unglück','laßt','schönen','blühe','waschbär','schildkröte');
usort($DE,'strcoll'); 
=> array('blühe','laßt','schildkröte','schönen','unglück','waschbär')

Using Collator PHP library

As an alternative, collation-aware string comparisons and sorting can be carried out using Collator library. This library is not available unless internationalization PHP extension has been installed, which for Linux distribution like Debian GNU/Linux and many of its derivatives can be achieved with either pecl or preferably apt-get utility.

Logged in as privileged user enter:


apt-get instal php[your-php-version]-intl

Replace [php-version] with whatever version you need internationalization module for. Once installation process is complete, Collator library is available for both php-cli and Apache ( you might need to reload your server configuration first ) and ready to use.

Here is a quick example of a comparison and sorting of some Polish language words with Collator.


$collator = new Collator('pl_PL');

// comparing
$collator -> compare ( 'świerk' , 'sosna' )
=> 1

// sorting
$sortable = array('ściana' ,'słowo','ćwikła','cena');
$collator -> sort ( $sortable );

// new order of words in $sortable
=> array('cena','ćwikła','słowo','ściana')

This post was updated on 06 Oct 2021 21:41:11

Tags:  php ,  sort 


Author, Copyright and citation

Author

Sylwester Wojnowski

Author of the this article - Sylwester Wojnowski - is a sWWW web developer. He has been writing computer code for the websites and web applications since 1998.

Copyrights

©Copyright, 2024 Sylwester Wojnowski. This article may not be reproduced or published as a whole or in parts without permission from the author. If you share it, please give author credit and do not remove embedded links.

Computer code, if present in the article, is excluded from the above and licensed under GPLv3.

Citation

Cite this article as:

Wojnowski, Sylwester. "Sorting UTF-8 strings in PHP." From sWWW - Code For The Web . https://swww.com.pl//main/index/sorting-utf-8-strings-in-php

Add Comment

Allowed BB Code - style tags: [b][/b], [i][/i], [code=text][/code],[code=javascript][/code],[code=php][/code],[code=bash][/code],[code=css][/code],[code=html][/code]


I constent to processing my data given through this form for purposes of a reply by the administrator of this website.

Recent Comments

Nobody has commented on this post yet. Be first!