Jump to content

Web Client - Subtitles - Character Encoding


johnorange

Recommended Posts

johnorange

Hi there,

 

My first post here. I have installed the latest beta version of the Media Browser and been using it for a couple of days. The application is superbe, and I must say I prefer it to a lot of other servers. It has been working flawlessly so far, but I have a subtitle issue that I can't get my head around. I am writing this post with the hope you can help me with it.

 

I have numerous movies with Romanian subtitles and I mostly use the web client. Unfortunatelly, special characters are not displayed correctly. I attach a screen capture to exemplify. I think these subtitles are mostly internal to the mkv, but there are some which are external. Can you point me in the right direction as to how can I fix this?

 

Couple of notes which perhaps will help other users as well:

1. Insofar external srt files are concerned, I managed to fix this by opening each file and changing the character set to Windows-1250 first, and then convert the srt file to UTF8. I used Notepad++. It's a bit cumbersome however to do so when one has tens of movies but it works.

2. Still do not know how to approach this for mkvs with internal subtitles.

 

Ideally, the web client should allow the user to change character sets on the fly, as one can do generally with media players. This would allow the encoding service to adjust accordingly without trying to guess encoding based on locale in the name of the srt or otherwise. I am happy to change the dashboard-ui files to add the necessary control, but I need an entry point into the actual encoding service on the server side. Perhaps a developer can chime in and provide further guidance?

 

Many thanks and best regards,

JO

post-6673-0-75210800-1414938298_thumb.jpg

Edited by johnorange
Link to comment
Share on other sites

I'm not sure I like that because if the web client does it then every client has to do it. That's a lot of work for the whole team. As you say it's not possible to guess the encoding with 100% accuracy. We will probably just have to add settings regarding this.

Link to comment
Share on other sites

johnorange

I'm not sure I like that because if the web client does it then every client has to do it. That's a lot of work for the whole team. As you say it's not possible to guess the encoding with 100% accuracy. We will probably just have to add settings regarding this.

Hi Luke,

 

Thanks for the prompt follow up. Much appreciated.

 

I think the point I am trying to get accros is to allow the encoding service on the server to react to character encoding parameters, when sent by the client. How I see it, the encoding service would operate as it is operating right now, so that you do not break compatibility in any way, but would also take into account params sent by the web client from the dashboard-ui (similarly to how you change resolution or subtitles or audio tracks, by selecting from a pre-populated list). These would override the existing char sets for the respective streaming session that the client initiated.

 

The other clients would be updated in time, as and if their user base feels this is necessary.

 

I have seen this functionality in all mediaboxes and it would no doubt be a solution to what seems to be a rather common issue with media servers, and the variety of devices used to consume media.

 

I am ready to contribute with this if I am given a bit more info on the relevant cs classes on the server side.

 

2. On a different point, do you have any suggestions about the character set issue when dealing with internal subtitles?

 

I remain very much in debt for your time,

 

Best regards,

JO

Edited by johnorange
Link to comment
Share on other sites

Well I know this is the most powerful way, but I also think there are easier ways that will fit for the vast majority of people, and will then work immediately from every client. For instance, we can have a setting to enable/disable utf8 auto-detection, along with a setting to assume the encoding is a given value based on language. That's the direction I'm leaning towards.

Link to comment
Share on other sites

johnorange

That's certainly simpler for the user and more pragmatic. Excellent idea. May I ask if you have a tentative timing for this?

 

On a second note, can you please give me some suggestions as to which files on thr server side I would need to look over if I decide to address the other approach I mentioned above?

 

Best regards,

JO

Edited by johnorange
  • Like 1
Link to comment
Share on other sites

Here is the current detection:

 

https://github.com/MediaBrowser/MediaBrowser/blob/master/MediaBrowser.MediaEncoding/Subtitles/SubtitleEncoder.cs#L170

 

The method is used in a couple places. You could update it to allow the encoding to be passed in rather than detected. That method is called in 2-3 different places.

  • Like 1
Link to comment
Share on other sites

  • 2 months later...

Hi, the same issue bothering me for a long time, and it is basically the only reason why I'm still using Plex.

MB3 is superior in all other aspects, but not be able to support subtitles in other languages is a pity.

I think this can also be fixed at server side by adding an option to allow specifying the charset of the subtitle (the XBMC way).

Auto-detection is probably difficult to implement, but I do see it works in some MC such as MediaPortal.

Link to comment
Share on other sites

Here is the current detection:

 

https://github.com/MediaBrowser/MediaBrowser/blob/master/MediaBrowser.MediaEncoding/Subtitles/SubtitleEncoder.cs#L170

 

The method is used in a couple places. You could update it to allow the encoding to be passed in rather than detected. That method is called in 2-3 different places.

Hi, I found out a way to transcode all srt subtitle from my local language in GB2312 and BIG5 charset to UTF-8 (with BOM as otherwise it doesn't work). 

I found a Java library that can detect/guess encoding of underlying charset and it worked pretty well, but I don't know if .Net has a similar library. 

Below is the source code of the program:

 

 
import java.io.File;
 
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
 
import org.apache.commons.io.FileUtils;
import org.mozilla.universalchardet.UniversalDetector;
 
public class Main {
public static void main(String... args) throws Exception {
final JFileChooser fc = new JFileChooser();
fc.setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY);
if (JFileChooser.APPROVE_OPTION == fc.showOpenDialog(null)) {
Object[] options = { "Covert to UTF-8", "Print Encoding" };
int n = JOptionPane.showOptionDialog(fc,
"Are you sure you want to convert all files to UTF-8?",
"Confirm the operation", JOptionPane.YES_NO_OPTION,
JOptionPane.WARNING_MESSAGE, null, options, options[1]);
 
File[] files = fc.getSelectedFile().listFiles();
changeEncodingToUTF8(files, n == 0);
}
}
 
private static void changeEncodingToUTF8(File[] files, boolean convert)
throws Exception {
for (File file : files) {
if (file.isDirectory()) {
changeEncodingToUTF8(file.listFiles(), convert);
} else if (file.getName().endsWith(".srt")) {
byte[] buf = new byte[4096];
java.io.FileInputStream fis = new java.io.FileInputStream(file);
 
// (1)
UniversalDetector detector = new UniversalDetector(null);
 
// (2)
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println(file.getName() + ": " + encoding);
if (convert) {
if (encoding.equals("BIG5")) {
big5ToUTF8(file);
} else if (encoding.equals("GB18030")) {
gbToUTF8(file);
}
}
} else {
System.out.println(file.getName() + ": N/A");
}
// (5)
detector.reset();
}
}
}
 
private static void gbToUTF8(File file) throws Exception {
String content = FileUtils.readFileToString(file, "GB18030");
FileUtils.write(file, "\uFEFF" + content, "UTF-8");
 
}
 
private static void big5ToUTF8(File file) throws Exception {
String content = FileUtils.readFileToString(file, "BIG5");
FileUtils.write(file, "\uFEFF" + content, "UTF-8");
}
 
}
Link to comment
Share on other sites

  • 1 year later...
anderbytes

Hey, anyone looking into this?

 

People that use subtitles outside US get very impacted with this problem.

Link to comment
Share on other sites

Hey, anyone looking into this?

 

People that use subtitles outside US get very impacted with this problem.

 

We have added encoding detection. What exactly is your issue? Please do not say "same issue". We need to look at it in detail. Please see how to report a problem. Thanks.

Link to comment
Share on other sites

anderbytes

Okay, and sorry for the delay.

 

I'll be trying to reproduce the problem. From what I remember, it was a specific situation or client.

Link to comment
Share on other sites

anderbytes

Tried and tried and couldn't reproduce it. Damn...

 

Probably it was a specific subtitle of a specific media that I don't remember right now.

If I every stumble with it again, I'll make sure to take note of which is, and post here info of that file.

 

You can close this topic, for now.

Again, Thanks!

Link to comment
Share on other sites

  • 1 month later...

I know probably we're in minority, but even though you have encoding detection, there are still issues with subtitles in Romanian for example.
It's fine at home, since I have emby for kodi (kodi has the option to select "central europe"), but away on my iphone or tablet it's annoying, doing that everytime with a new episode.

 

Characters Ț respectively Ș are displayed: þ and º

 

The manual fix it's the same - opening each file and changing the character set to Windows-1250 first, and then convert the srt file to UTF8, using Notepad++.

 

Any change this can be looked into? If not at least a script to have it run automatically on my emby media folders running on Synology?

Thanks for your time.

 

Thanks.

Edited by ciss
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...