Bug #1492

ST sometimes doesn't display SRT subtitles (ansi->utf8 conversion bug)

Added by Ema Nymton about 10 years ago. Updated about 9 years ago.

Status:FixedStart date:01/02/2013
Priority:HighDue date:
Assignee:Andreas Smas% Done:

100%

Category:Subtitles
Target version:4.4
Found in version:All Platform:Linux

Description

I'm currently watching a movie split into 3 parts (Bei Qing Cheng Shi (CD1) CD2 and CD3).
While the associated subtitles of CD1 are properly decoded and rendered, the ones from CD2 are not rendered. When checking the debug mode, it says that the subtitle file is 'not valid UTF-8. Decoding it as ISO-8859-1 (Latin-1)'. However, nothing is displayed on the screen. Both subtitle files are formatted in the exact same way (I'm attaching both).

Bei Qing Cheng Shi (CD1).srt (32.8 KB) Ema Nymton, 01/02/2013 10:06 AM

Bei Qing Cheng Shi (CD2).srt (30.5 KB) Ema Nymton, 01/02/2013 10:06 AM

Bei Qing Cheng Shi (CD2).srt (30.5 KB) open ps3, 01/11/2013 06:50 PM

Engrenages.S02E02.DVDrip.576p.H264.en.srt (49.4 KB) Ema Nymton, 01/12/2014 04:35 PM

Associated revisions

Revision e475ad6e
Added by Andreas Smas about 9 years ago

charset conversion: Avoid accidental converts into NUL byte

Fixes #1492

History

#1 Updated by open ps3 about 10 years ago

Its not UTF-8 or Latin-1
Notepad++ says Dos\Windows ANSI
Try this file please: (i convert it)

#2 Updated by open ps3 about 10 years ago

I tested! Seems its your Problem.. I convert

#3 Updated by Andreas Smas about 10 years ago

  • Tracker changed from Bug to Feature

#4 Updated by Ema Nymton about 10 years ago

open ps3 wrote:

I tested! Seems its your Problem.. I convert

Thanks for checking, appreciate your help) :)

#5 Updated by Andreas Smas over 9 years ago

  • Status changed from New to Need feedback

This bug can be closed, right?

#6 Updated by Ema Nymton about 9 years ago

Sorry for the late feedback Andreas... ! I had the same issue with another file, see attached.

01:48:39.999: Subtitles [INFO]:smb://192.168.1.103/Downloads/Engrenages.Season.2.DVDrip.576p.H264//Engrenages.S02E03.DVDrip.576p.H264.en.srt is not valid UTF-8. Decoded as windows-1252 (detected language: en)

Basically, ST falls back to windows-1252 charset when it detects a non-UTF-8 file when the 'Default character set' option in ST is set to 'Auto'. My sample file is in ANSI.

The subtitle is not decoded anymore from line 379 onwards. This line includes some characters that are not part of the ANSI charset and are therefore not decoded properly. However, this shouldn't prevent the subtitle to be further decoded. I'm personally fine with an occasional mojibake :D

Note that the problem can be resolved by manually setting the 'Default character set' to any other value. Even Latin-2 will work and will actually display some mojibakes when playing line 379

02:00:54.569: Subtitles [INFO]:smb://192.168.1.103/Downloads/Engrenages.Season.2.DVDrip.576p.H264//Engrenages.S02E02.DVDrip.576p.H264.en.srt is not valid UTF-8. Decoded as ISO-8859-2 (Latin-2) (specified by user)

I think ST should still display the subtitles even if the file includes some characters that cannot be rendered in the charset the file is coded in.

#7 Updated by Leonid Protasov about 9 years ago

  • Tracker changed from Feature to Bug
  • Subject changed from Some subtitles are incorrectly handled to ST doesn't detect SRT in ANSI format.
  • Found in version set to All
  • Platform set to Linux

#8 Updated by Leonid Protasov about 9 years ago

Still reproducible on 4.3.739.

Both files are coded the same. But on first file:

Subtitles [DEBUG]: Trying to load file:///root/Video/Bei Qing Cheng Shi (CD1).srt
Subtitles [DEBUG]: Loaded file:///root/Видео/Bei Qing Cheng Shi (CD1).srt OK

Second file:

Subtitles [DEBUG]: Trying to load file:///root/Video/Bei Qing Cheng Shi (CD2).srt
Subtitles [INFO]: file:///root/Video/Bei Qing Cheng Shi (CD2).srt is not valid UTF-8. Decoded as ISO-8859-1 (detected language: en)
Subtitles [DEBUG]: Loaded file:///root/Video/Bei Qing Cheng Shi (CD2).srt OK

With first file subtitles are displayng but on second not.

That is interesting that if you open first file in Notepad++ it is detected as ANSI as UTF-8
and second just ANSI.
If you open them in hex editor - they coded the same. Looks like a bug in detection algo...

#9 Updated by Leonid Protasov about 9 years ago

  • Subject changed from ST doesn't detect SRT in ANSI format. to ST sometimes doesn't display SRT subtitles (codepage detection issue)
  • Priority changed from Normal to High
  • Target version set to 4.4

#10 Updated by Leonid Protasov about 9 years ago

  • Subject changed from ST sometimes doesn't display SRT subtitles (codepage detection issue) to ST sometimes doesn't display SRT subtitles (ansi->utf8 conversion bug)

All affected srt are plain ansi coded. But when detection algo says it's not UTF8 and converts them to ISO, something goes wrong as converted subs are not displayed at all.

#11 Updated by Andreas Smas about 9 years ago

  • Status changed from Need feedback to Fixed
  • % Done changed from 0 to 100

Applied in changeset git|commit:e475ad6ed35a65d2202a7018404b6832baf9c5c0.

Also available in: Atom PDF