ST sometimes doesn't display SRT subtitles (ansi->utf8 conversion bug)
|Assignee:||Andreas Smas||% Done:|
|Found in version:||All||Platform:||Linux|
I'm currently watching a movie split into 3 parts (Bei Qing Cheng Shi (CD1) CD2 and CD3).
While the associated subtitles of CD1 are properly decoded and rendered, the ones from CD2 are not rendered. When checking the debug mode, it says that the subtitle file is 'not valid UTF-8. Decoding it as ISO-8859-1 (Latin-1)'. However, nothing is displayed on the screen. Both subtitle files are formatted in the exact same way (I'm attaching both).
#6 Updated by Ema Nymton over 7 years ago
- File Engrenages.S02E02.DVDrip.576p.H264.en.srt added
Sorry for the late feedback Andreas... ! I had the same issue with another file, see attached.
01:48:39.999: Subtitles [INFO]:smb://192.168.1.103/Downloads/Engrenages.Season.2.DVDrip.576p.H264//Engrenages.S02E03.DVDrip.576p.H264.en.srt is not valid UTF-8. Decoded as windows-1252 (detected language: en)
Basically, ST falls back to windows-1252 charset when it detects a non-UTF-8 file when the 'Default character set' option in ST is set to 'Auto'. My sample file is in ANSI.
The subtitle is not decoded anymore from line 379 onwards. This line includes some characters that are not part of the ANSI charset and are therefore not decoded properly. However, this shouldn't prevent the subtitle to be further decoded. I'm personally fine with an occasional mojibake
Note that the problem can be resolved by manually setting the 'Default character set' to any other value. Even Latin-2 will work and will actually display some mojibakes when playing line 379
02:00:54.569: Subtitles [INFO]:smb://192.168.1.103/Downloads/Engrenages.Season.2.DVDrip.576p.H264//Engrenages.S02E02.DVDrip.576p.H264.en.srt is not valid UTF-8. Decoded as ISO-8859-2 (Latin-2) (specified by user)
I think ST should still display the subtitles even if the file includes some characters that cannot be rendered in the charset the file is coded in.
#8 Updated by Leonid Protasov over 7 years ago
Still reproducible on 4.3.739.
Both files are coded the same. But on first file:
Subtitles [DEBUG]: Trying to load file:///root/Video/Bei Qing Cheng Shi (CD1).srt Subtitles [DEBUG]: Loaded file:///root/Видео/Bei Qing Cheng Shi (CD1).srt OK
Subtitles [DEBUG]: Trying to load file:///root/Video/Bei Qing Cheng Shi (CD2).srt Subtitles [INFO]: file:///root/Video/Bei Qing Cheng Shi (CD2).srt is not valid UTF-8. Decoded as ISO-8859-1 (detected language: en) Subtitles [DEBUG]: Loaded file:///root/Video/Bei Qing Cheng Shi (CD2).srt OK
With first file subtitles are displayng but on second not.
That is interesting that if you open first file in Notepad++ it is detected as ANSI as UTF-8
and second just ANSI.
If you open them in hex editor - they coded the same. Looks like a bug in detection algo...
#10 Updated by Leonid Protasov over 7 years ago
- Subject changed from ST sometimes doesn't display SRT subtitles (codepage detection issue) to ST sometimes doesn't display SRT subtitles (ansi->utf8 conversion bug)
All affected srt are plain ansi coded. But when detection algo says it's not UTF8 and converts them to ISO, something goes wrong as converted subs are not displayed at all.