Feature #1956

Add cyrillic mojibake title decoder for icecast

Added by Leonid Protasov over 6 years ago. Updated over 6 years ago.

Status:NewStart date:02/09/2014
Priority:LowDue date:
Assignee:Andreas Smas% Done:

0%

Category:General
Target version:-

Description

icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69 74 6c 65 3d 27 5a 61 70 StreamTitle='Zap
icymeta [DEBUG]: 0x000010: 61 73 6b 61 20 2d 20 c3 8d c3 a5 c3 b1 c3 af c3 aska - .........
icymeta [DEBUG]: 0x000020: ae c3 a4 c2 b3 c3 a2 c3 a0 c3 ad c3 ae 20 5b d0 ............. [.
icymeta [DEBUG]: 0x000030: ad d1 82 d0 bd d0 be 5d 27 3b .......]';
Radio [DEBUG]: Title decoded as Zapaska - Íåñïîä³âàíî [Этно] to 'Decoding as UTF-8'

Let me explain that. ST detects the input as utf8 and that is correct. The problem is that station is converting title bad.
Take a note that mojibake utf8 chars (Íåñïîä³âàíî) are all 0xc3. So to diplay it properly you just need:
$string = iconv('utf-8', 'cp1252', $string);
$string = iconv('cp1251', 'utf-8', $string);

Adding that feature would be great.

You can see that mojibake for example here:

http://music.myradio.com.ua:8000/sheshory128.mp3

History

#1 Updated by Andreas Smas over 6 years ago

  • Target version deleted (4.6)

#2 Updated by Leonid Protasov over 6 years ago

One more cp1252->cp1251 mojibake

icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69  74 6c 65 3d 27 c4 f3 ec    StreamTitle='...
icymeta [DEBUG]: 0x000010: ea e8 20 c2 e3 ee eb ee  f1 20 2d 20 cc ee ff 20    .. ...... - ... 
icymeta [DEBUG]: 0x000020: c7 ee f0 ff 27 3b 53 74  72 65 61 6d 55 72 6c 3d    ....';StreamUrl=
icymeta [DEBUG]: 0x000030: 27 68 74 74 70 3a 2f 2f  65 72 61 64 69 6f 2e 6e    'http://eradio.n
icymeta [DEBUG]: 0x000040: 65 74 2e 75 61 27 3b                                et.ua';
Radio [DEBUG]: Title decoded as Äóìêè Âãîëîñ - Ìîÿ Çîðÿ to 'Unable to determine character encoding, decoding as ISO-8859-1 (Latin-1)'

#3 Updated by Leonid Protasov over 6 years ago

icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69  74 6c 65 3d 27 d3 ea f0    StreamTitle='...
icymeta [DEBUG]: 0x000010: e0 bf ed f1 fc ea b3 20  c0 f0 f2 e8 f1 f2 e8 20    ....... ....... 
icymeta [DEBUG]: 0x000020: 2d 20 c1 f0 e0 f2 20 c7  e0 20 c1 f0 e0 f2 e0 27    - .... .. .....'
icymeta [DEBUG]: 0x000030: 3b 53 74 72 65 61 6d 55  72 6c 3d 27 68 74 74 70    ;StreamUrl='http
icymeta [DEBUG]: 0x000040: 3a 2f 2f 65 72 61 64 69  6f 2e 6e 65 74 2e 75 61    ://eradio.net.ua
icymeta [DEBUG]: 0x000050: 27 3b                                               ';
Radio [DEBUG]: Title decoded as ΣκπΰΏνρόκ³ ΐπςθρςθ - Απΰς Ηΰ Απΰςΰ to 'Decoded as ISO-8859-7 (detected language: el)'

#4 Updated by Leonid Protasov over 6 years ago

Radio [DEBUG]: Title decoded as �²� - �במםוםע to 'Decoded as ISO-8859-8 (detected language: he)'
icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69  74 6c 65 3d 27 27 3b 53    StreamTitle='';S
icymeta [DEBUG]: 0x000010: 74 72 65 61 6d 55 72 6c  3d 27 68 74 74 70 3a 2f    treamUrl='http:/
icymeta [DEBUG]: 0x000020: 2f 65 72 61 64 69 6f 2e  6e 65 74 2e 75 61 27 3b    /eradio.net.ua';
Radio [DEBUG]: Title decoded as  to 'Decoding as UTF-8'
icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69  74 6c 65 3d 27 54 61 6c    StreamTitle='Tal
icymeta [DEBUG]: 0x000010: 69 74 61 20 4b 75 6d 20  2d 20 37 20 df 27 3b 53    ita Kum - 7 .';S
icymeta [DEBUG]: 0x000020: 74 72 65 61 6d 55 72 6c  3d 27 68 74 74 70 3a 2f    treamUrl='http:/
icymeta [DEBUG]: 0x000030: 2f 65 72 61 64 69 6f 2e  6e 65 74 2e 75 61 27 3b    /eradio.net.ua';
Radio [DEBUG]: Title decoded as Talita Kum - 7 ß to 'Decoded as ISO-8859-1 (detected language: it)'

#5 Updated by Leonid Protasov over 6 years ago

icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69  74 6c 65 3d 27 44 61 72    StreamTitle='Dar
icymeta [DEBUG]: 0x000010: 77 69 6e 20 2d 20 ce e7  ee ed 20 c7 e8 ec e0 27    win - .... ....'
icymeta [DEBUG]: 0x000020: 3b 53 74 72 65 61 6d 55  72 6c 3d 27 68 74 74 70    ;StreamUrl='http
icymeta [DEBUG]: 0x000030: 3a 2f 2f 65 72 61 64 69  6f 2e 6e 65 74 2e 75 61    ://eradio.net.ua
icymeta [DEBUG]: 0x000040: 27 3b                                               ';
Radio [DEBUG]: Title decoded as Darwin - 敁鍙 я憵 to 'Decoded as BIG5 (detected language: <Unknown>)'

#6 Updated by Leonid Protasov over 6 years ago

Radio [DEBUG]: http://radio.tstu.edu.ua:8000/ukr.m3u guessed to be an m3u based on content
icymeta [DEBUG]: 0x000000: 53 74 72 65 61 6d 54 69  74 6c 65 3d 27 46 6c 65    StreamTitle='Fle
icymeta [DEBUG]: 0x000010: 75 72 20 2d 20 ff fe 1a  04 30 04 40 04 43 04 41    ur - [email protected]
icymeta [DEBUG]: 0x000020: 04 35 04 3b 04 4c 04 27  3b                         .5.;.L.';
Radio [DEBUG]: Title decoded as 'Decoded as ISO-8859-1 (detected language: es)' to 'Fleur - ÿþ0@CA5;L'

#7 Updated by Leonid Protasov over 6 years ago

  • Target version set to 4.6
    var cp1252 = 'ÀÁÂÃÄŨÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕ×ÖØÙÜÚÛÝÞßàáâãäå¸æçèéêëìíîïðñòóôõ÷öøùüúûýþÿ³º';
    var cp1251 = 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЧЦШЩЬЪЫЭЮЯабвгдеёжзийклмнопрстуфхчцшщьъыэюяіє';
    function fixMB(s) {
        var fixed = '';
        for (var i = 0; i < s.length - 2; i++)
            if (cp1252.indexOf(s[i]) > - 1 && cp1252.indexOf(s[i+1]) > -1 && cp1252.indexOf(s[i+2]) > -1) {
           for (var i = 0; i < s.length; i++)
               cp1252.indexOf(s[i]) != -1 ? fixed += cp1251[cp1252.indexOf(s[i])] : fixed += s[i];
               showtime.print("mojibake fixed "+fixed);
               return fixed;
            }
    return s;
    };

This function fixes cyrillic mojibakes in utf8. That is the most common case. ST/plugin should apply it on title after it is detected as utf8. It's not applyable to non utf8.

#8 Updated by Leonid Protasov over 6 years ago

Added Ukrainian support:

    var cp1252 = 'ÀÁÂÃÄŨÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕ×ÖØÙÜÚÛÝÞßàáâãäå¸æçèéêëìíîïðñòóôõ÷öøùüúûýþÿ³²ºª¿¯´¥';
    var cp1251 = 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЧЦШЩЬЪЫЭЮЯабвгдеёжзийклмнопрстуфхчцшщьъыэюяіІєЄїЇґҐ';
    function fixMB(s) {
        var fixed = '';
        for (var i = 0; i < s.length - 2; i++)
            if (cp1252.indexOf(s[i]) > - 1 && cp1252.indexOf(s[i+1]) > -1 && cp1252.indexOf(s[i+2]) > -1) {
           for (var i = 0; i < s.length; i++)
               cp1252.indexOf(s[i]) != -1 ? fixed += cp1251[cp1252.indexOf(s[i])] : fixed += s[i];
               showtime.print("Before: " + s + " After: " + fixed);
               return fixed;
            }
    return s;
    };

#9 Updated by Andreas Smas over 6 years ago

  • Category changed from Audio to General

#10 Updated by Andreas Smas over 6 years ago

  • Priority changed from Normal to Low

#11 Updated by Andreas Smas over 6 years ago

  • Target version deleted (4.6)

Also available in: Atom PDF