Bug #1492
ST sometimes doesn't display SRT subtitles (ansi->utf8 conversion bug)
Status: | Fixed | Start date: | 01/02/2013 | |
---|---|---|---|---|
Priority: | High | Due date: | ||
Assignee: | % Done: | 100% | ||
Category: | Subtitles | |||
Target version: | 4.4 | |||
Found in version: | All | Platform: | Linux |
Description
I'm currently watching a movie split into 3 parts (Bei Qing Cheng Shi (CD1) CD2 and CD3).
While the associated subtitles of CD1 are properly decoded and rendered, the ones from CD2 are not rendered. When checking the debug mode, it says that the subtitle file is 'not valid UTF-8. Decoding it as ISO-8859-1 (Latin-1)'. However, nothing is displayed on the screen. Both subtitle files are formatted in the exact same way (I'm attaching both).
Associated revisions
charset conversion: Avoid accidental converts into NUL byte
Fixes #1492
History
#1
Updated by open ps3 about 10 years ago
- File Bei Qing Cheng Shi (CD2).srt added
Its not UTF-8 or Latin-1
Notepad++ says Dos\Windows ANSI
Try this file please: (i convert it)
#2
Updated by open ps3 about 10 years ago
I tested! Seems its your Problem.. I convert
#3
Updated by Andreas Smas about 10 years ago
- Tracker changed from Bug to Feature
#4
Updated by Ema Nymton about 10 years ago
open ps3 wrote:
I tested! Seems its your Problem.. I convert
Thanks for checking, appreciate your help)
#5
Updated by Andreas Smas over 9 years ago
- Status changed from New to Need feedback
This bug can be closed, right?
#6
Updated by Ema Nymton about 9 years ago
- File Engrenages.S02E02.DVDrip.576p.H264.en.srt added
Sorry for the late feedback Andreas... ! I had the same issue with another file, see attached.
01:48:39.999: Subtitles [INFO]:smb://192.168.1.103/Downloads/Engrenages.Season.2.DVDrip.576p.H264//Engrenages.S02E03.DVDrip.576p.H264.en.srt is not valid UTF-8. Decoded as windows-1252 (detected language: en)
Basically, ST falls back to windows-1252 charset when it detects a non-UTF-8 file when the 'Default character set' option in ST is set to 'Auto'. My sample file is in ANSI.
The subtitle is not decoded anymore from line 379 onwards. This line includes some characters that are not part of the ANSI charset and are therefore not decoded properly. However, this shouldn't prevent the subtitle to be further decoded. I'm personally fine with an occasional mojibake
Note that the problem can be resolved by manually setting the 'Default character set' to any other value. Even Latin-2 will work and will actually display some mojibakes when playing line 379
02:00:54.569: Subtitles [INFO]:smb://192.168.1.103/Downloads/Engrenages.Season.2.DVDrip.576p.H264//Engrenages.S02E02.DVDrip.576p.H264.en.srt is not valid UTF-8. Decoded as ISO-8859-2 (Latin-2) (specified by user)
I think ST should still display the subtitles even if the file includes some characters that cannot be rendered in the charset the file is coded in.
#7
Updated by Leonid Protasov about 9 years ago
- Tracker changed from Feature to Bug
- Subject changed from Some subtitles are incorrectly handled to ST doesn't detect SRT in ANSI format.
- Found in version set to All
- Platform set to Linux
#8
Updated by Leonid Protasov about 9 years ago
Still reproducible on 4.3.739.
Both files are coded the same. But on first file:
Subtitles [DEBUG]: Trying to load file:///root/Video/Bei Qing Cheng Shi (CD1).srt Subtitles [DEBUG]: Loaded file:///root/Видео/Bei Qing Cheng Shi (CD1).srt OK
Second file:
Subtitles [DEBUG]: Trying to load file:///root/Video/Bei Qing Cheng Shi (CD2).srt Subtitles [INFO]: file:///root/Video/Bei Qing Cheng Shi (CD2).srt is not valid UTF-8. Decoded as ISO-8859-1 (detected language: en) Subtitles [DEBUG]: Loaded file:///root/Video/Bei Qing Cheng Shi (CD2).srt OK
With first file subtitles are displayng but on second not.
That is interesting that if you open first file in Notepad++ it is detected as ANSI as UTF-8
and second just ANSI.
If you open them in hex editor - they coded the same. Looks like a bug in detection algo...
#9
Updated by Leonid Protasov about 9 years ago
- Subject changed from ST doesn't detect SRT in ANSI format. to ST sometimes doesn't display SRT subtitles (codepage detection issue)
- Priority changed from Normal to High
- Target version set to 4.4
#10
Updated by Leonid Protasov about 9 years ago
- Subject changed from ST sometimes doesn't display SRT subtitles (codepage detection issue) to ST sometimes doesn't display SRT subtitles (ansi->utf8 conversion bug)
All affected srt are plain ansi coded. But when detection algo says it's not UTF8 and converts them to ISO, something goes wrong as converted subs are not displayed at all.
#11
Updated by Andreas Smas about 9 years ago
- Status changed from Need feedback to Fixed
- % Done changed from 0 to 100
Applied in changeset git|commit:e475ad6ed35a65d2202a7018404b6832baf9c5c0.