Jump to content

Mediabrowser server unresponsive after idle time


puithove

Recommended Posts

puithove

I'm seeing an issue in the past couple updates of the server where the service is unresponsive after idling for a while - meaning, running fine when I go to bed, but in the morning doesn't respond to clients or to the web console.  Page loading for the web console will just hang until the browser times out.  My Kodi clients will attempt to load a media file (I'm using the built-in HTTP streaming, not SMB) but just hang on the loading indicator.

 

A simple restart of the service brings it back to life.

 

When in this state, I see this continually repeating every couple seconds (this is from the console log collected by journald)

Mar 01 07:14:55 big1 mediabrowser-server[24304]: getifaddrs() failed
Mar 01 07:14:55 big1 mediabrowser-server[24304]: System.SystemException
Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed
Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed
Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:17 big1 mediabrowser-server[24304]: UnhandledException

I also see occasionally this mixed inbetween (may or may not be related):

Mar 01 07:14:54 big1 mediabrowser-server[24304]: many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:54 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too many open files
Mar 01 07:14:55 big1 mediabrowser-server[24304]: exception inside UnhandledException handler: Too manUnhandledException
Mar 01 07:14:55 big1 mediabrowser-server[24304]: getifaddrs() failed
Mar 01 07:14:55 big1 mediabrowser-server[24304]: System.SystemException
Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:14:55 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed
Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: UnhandledException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: getifaddrs() failed
Mar 01 07:15:11 big1 mediabrowser-server[24304]: System.SystemException
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.LinuxNetworkInterface.ImplGetAllNetworkInterfaces () [0x00000] in <filename unknown>:0
Mar 01 07:15:11 big1 mediabrowser-server[24304]: at System.Net.NetworkInformation.NetworkInterface.GetAllNetworkInterfaces () [0x00000] in <filename unknown>:0

I will say that this machine has a somewhat unusual networking setup because I run several VMs on it using KVM.  I have 4 NIC ports in a bond, then that is part of a bridge which is the active network interface that the host OS uses as well as where the virtual NICs for the VMs are bridged in.  Bridge interface has both IPv4 & IPv6 addresses statically assigned.  Mediabrowser server is running on the host OS directly.

 

I was thinking because of this and because the error seems to indicate an issue with getting interface info, it would be good to tell MB to use a certain interface only, but I don't see that option anywhere.

 

Either way, this didn't seem to be an issue until something in the last couple updates (or maybe a kernel update or something).

 

Any ideas for something to try?

Edited by puithove
Link to comment
Share on other sites

puithove

Running on Arch Linux with the following versions

 

Mediabrowser server:  3.0.5518.7

Mono:  3.12.0

Kernel:  3.18.6

Edited by puithove
Link to comment
Share on other sites

thefirstofthe300

This might be over my head but the first thing I would recommend is using downgrade from the AUR to install mono 3.10. Mono 3.12 has had some known issues with the server due to a bug in mono.

Edited by DaBungalow
Link to comment
Share on other sites

puithove

Just had the issue when I went to start a video now.  Was working fine several hours ago.

 

Had mono 3.10 in my pacman cache, so have downgraded.  Will try that and see how things are tomorrow.

 

thanks.

Link to comment
Share on other sites

puithove

Hmmmm.... interesting.  That might be it.  All good this morning.  I'll leave it that way for another day or so.  If still good, I'll verify by switching back to 3.12.

Link to comment
Share on other sites

puithove

Nope, unfortunately it seems to have fallen down again the same way while I was at work.  No change.

Link to comment
Share on other sites

Hi,

 

I'm on Debian Jessy with 3.16 kernel, running mono 3.10.

 

I get the same error messages, but it happens quite randomly. Sometimes the server runs fine for a couple of days, sometimes it crashes after one day.

Link to comment
Share on other sites

puithove

 

Sure.  This is the log that runs up until I checked it this morning.  It was having the issue when I restarted the service at 6:36 (just a few minutes ago).  The last entry in the log was at 5:35

 

http://pastebin.com/WigibfCz

Link to comment
Share on other sites

thefirstofthe300

Considering that server log was nothing but errors, I would say something is definitely wrong. Those errors I have never seen before though so this is a bit out of my depth.

 

@@Luke

Link to comment
Share on other sites

i don't see any problems there. those are all errors the server can recover from and continue on.

Link to comment
Share on other sites

Jackrats

I've now had 2 instances of MBS becoming unhappy with "Too many files open." Interestingly, when it happens, I can still get to the WebServer. I would have thought that the new connection would introduce an additional filehandle, but alas it responds there happily. But playing from Roku fails in this state and there are entries in the WebUI's notifications that scheduled functions are failing due to the filehandle exhaustion, such as:

 

 

Cache file cleanup failed.

3 days ago

Too many open files

Check for plugin updates failed.

3 days ago

Error: ConnectFailure (Too many open files)

 

The couple times it happened, I did a cursory look at lsof and was amazed at the line count for MBS related open files being > 125000 files! I thought I'd poke around today while in the working state and saw 112000 items, but after looking closer, I realize that every instance of a file, library, binary, socket, etc for every MBS thread is reported there. As the filehandles are shared across all threads of the process, this was actually a very misleading count. My /proc/xxx/fd for the MBS PID shows presently only 74 FD's in use. Ulimit on my system is default at 1024 with the kernel hard limit at 4096.

 

The next time I see a failure, I'll investigate a bit closer to see where the extra 900+ FD's have gone.

 

Some additional details: My MBS is also running on a KVM guest running CentOS 7 on a CentOS 7 host, with a simple bridge on the host network. However, I don't see the getifaddrs noted by the OP as likely being relevant to the root cause, but rather a side effect error of not having any more FD's available to the process.

 

Timing wise, I feel like this issue started after bringing in the Roku thumbnail functionality. Based on what logs I do have available, I upgraded to the BIF capable release on Feb 10 (3.0.5518.1-Beta.31.1.noarch) and subsequently started the thumbnail creation process. On Feb 16, I installed lsof, which I did when I first ran into the FD exhaustion.

 

Snipped a long block that basically says: Nothing in the log files stands out, the "Too many open files" message starts out of the blue when nothing else seems to be going on.

 

This second instance was while running the 3.0.5518.4-Beta.32.1.noarch.rpm.

 

At this point, I think my best hope of pinning this down further is to investigate what's actually open the next time it occurs, but now I'm ready for it this time!

 

Anyone else running into it, you'll probably want to start with:

1. Determine the PID for MediaBrowserServer ("ps -ef | grep MediaBrowserServer")

2. See what the open file descriptors are for ("ls -l /proc/PIDFROMSTEP1/fd")

 

That will show what each FD is for. Hopefully the leak will be on a filesystem file and not a socket. At present time I have 41 files and 35 sockets open. If you see a thousand'ish socket entries, pick some of the higher numbered ones (first part of the output after the date) and use "lsof" to see what they are exactly.  For example:

[root@mbserver mediabrowser]# ls -l /proc/32452/fd
... snip ...
lrwx------. 1 MediaBrowserServer media 64 Mar  3 11:44 63 -> socket:[12357592]

That shows FD 63 is a socket identified by 12357592.

[root@mbserver mediabrowser]# lsof | grep 12357592
Threadpoo 32452 15530 MediaBrowserServer   63u     IPv4           12357592       0t0        TCP mbserver.mydomain.com:8096->roku.mydomain.com:46629 (ESTABLISHED)

You'll actually see a ton of entries, one for each thread spawned by MBS. But they will all have the same details on the right as far as where they go and what ports are used.

 

It might also be interesting to see how many threads are spawned. My instance currently has 452 threads. This seems particularly high to me. I just restarted by MBS service and now have 16.

Link to comment
Share on other sites

puithove

FWIW - I haven't installed the Roku thumbs plugin.

 

I'll do a little digging on the files when it happens again (and if the wife isn't trying to watch tv at the time).  Right now things are working, but the open file count does seem high.  Output from lsof shows lots of NETLINK sockets like this (just a small snippet):

Threadpoo 27211 29824     mediabrowser  211u     sock                0,7          0t0    9684816 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  212u     sock                0,7          0t0    9668292 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  213u     sock                0,7          0t0    9686518 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  214u     sock                0,7          0t0    9684827 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  215u     sock                0,7          0t0    9723198 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  216u     sock                0,7          0t0    9686536 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  217u     sock                0,7          0t0    9721231 protocol: NETLINK
Threadpoo 27211 29824     mediabrowser  218u     sock                0,7          0t0    9684866 protocol: NETLINK

And the fd from proc (also just a snippet):  

lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 210 -> socket:[9648782]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 211 -> socket:[9684816]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 212 -> socket:[9668292]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 213 -> socket:[9686518]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 214 -> socket:[9684827]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 215 -> socket:[9723198]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 216 -> socket:[9686536]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 217 -> socket:[9721231]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 06:57 218 -> socket:[9684866]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:00 219 -> socket:[9671237]

The number here is growing as I'm sitting watching it.  Looks like about 1 every 15-20 secs or so.  Which actually, now that I think about it, that's reflected in the timestamps on the file:

lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:03 220 -> socket:[9686684]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:03 221 -> socket:[9724578]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:03 222 -> socket:[9686716]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:03 223 -> socket:[9668351]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:03 224 -> socket:[9687435]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:04 225 -> socket:[9671312]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:05 226 -> socket:[9724614]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:05 227 -> socket:[9724618]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:06 228 -> socket:[9687663]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  4 07:06 229 -> socket:[9724654]
Link to comment
Share on other sites

puithove

After restarting of course the count in /fd for the new process is drastically lower.  It seems to continue the same behavior of adding sockets a couple minutes after starting up.

Link to comment
Share on other sites

puithove

As of this morning it's in the hung state again.  Mediabrowser has 1024 open files under /proc/[pid]/fd

 

So I did take a couple examples from there to get the lsof output.  Examples:

lrwx------ 1 mediabrowser mediabrowser 64 Mar  5 05:59 998 -> socket:[11688527]
lrwx------ 1 mediabrowser mediabrowser 64 Mar  5 05:59 999 -> socket:[11693484]

lsof|grep 11693484

mono      23963           mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 10271     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
IO\x20Thr 23963 12789     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 14553     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 14731     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 16495     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 22438     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 23814     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Finalizer 23963 23964     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Timer-Sch 23963 23965     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 23966     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 23967     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 23969     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
mono      23963 24053     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
IO\x20Thr 23963 24054     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 24456     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 24888     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
IO\x20Thr 23963 26333     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 26594     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK
Threadpoo 23963 29018     mediabrowser  999u     sock                0,7          0t0   11693484 protocol: NETLINK

lsof|grep 11688527

mono      23963           mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 10271     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
IO\x20Thr 23963 12789     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 14553     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 14731     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 16495     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 22438     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 23814     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Finalizer 23963 23964     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Timer-Sch 23963 23965     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 23966     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 23967     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 23969     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
mono      23963 24053     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
IO\x20Thr 23963 24054     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 24456     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 24888     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
IO\x20Thr 23963 26333     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 26594     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK
Threadpoo 23963 29018     mediabrowser  998u     sock                0,7          0t0   11688527 protocol: NETLINK

Link to comment
Share on other sites

Jackrats

Of course, it would be a NETLINK socket, which I am admittedly clueless about. Looks like it is a mechanism to communicate with the kernel for things like routing table updates, multicast traffic, and some other things. My guess would be it's being used for multicast for UPNP functions. Which sort of makes sense from the one instance that I was able to dig into my logs for before they rolled -- the first complainer in the logs was UPnP messages failing to go out.

 

My system currently only has 1 NETLINK socket open. I'm trying to figure out how to dig more into the NETLINK socket, but on the next instance it might be worth looking at /proc/net/netlink:

 

# grep PIDOFMBS /proc/link/netlink

 

Not sure if that will yield anything useful or not yet, but it's the most I've found so far for looking into NETLINK sockets. Also trying to see if GDB might give anything useful on them.

Link to comment
Share on other sites

Jackrats

So, actually, in reading further on netlink, grep'ing to the PID won't give the goods. The PID is only used as the reference for the first netlink socket for a given process:

 

       nl_pid is the unicast address of netlink socket.  It's always 0 if       the destination is in the kernel.  For a user-space process, nl_pid       is usually the PID of the process owning the destination socket.       However, nl_pid identifies a netlink socket, not a process.  If a       process owns several netlink sockets, then nl_pid can be equal to the       process ID only for at most one socket.  There are two ways to assign       nl_pid to a netlink socket.  If the application sets nl_pid before       calling bind(2), then it is up to the application to make sure that       nl_pid is unique.  If the application sets it to 0, the kernel takes       care of assigning it.  The kernel assigns the process ID to the first       netlink socket the process opens and assigns a unique nl_pid to every       netlink socket that the process subsequently creates.

 

So, it's probably worth just cat'ing the entire /proc/net/netlink file. My system currently only has 38 entries, and my MBS currently only has 1 NETLINK socket open. The "Groups" column appears to decode to the type, here mine is 11 for the NETLINK_CONNECTOR type. The columns don't line up well.

[root@mbserver mediabrowser]# cat /proc/net/netlink
sk       Eth Pid    Groups   Rmem     Wmem     Dump     Locks     Drops     Inode
ffff88013971f800 0   32485  00000011 0        0        0000000000000000 2        0        9807871 

Also, looks like I typo'ed in my previous port -- it's /proc/net/netlink not /proc/link/netlink.

Link to comment
Share on other sites

puithove

I didn't know anything about NETLINK either when looking at it this morning, but did a quick google and turned up basically what you mention above.  I'll try to look a bit more when I can.  

 

Interestingly I do remember seeing in my logs the complaints about UPNP as well, so that's definitely worth looking a little further into.

Link to comment
Share on other sites

Jackrats

It looks like the "ss" command that comes with the iproute package also shows some info on netlink sockets:

ss -a -f netlink
Link to comment
Share on other sites

puithove

Well this is interesting.  Restarted the mediabrowser service yesterday because it was unresponsive.  Took a look at the file descriptors right after the restart and it was running around 50 open files/sockets.  This morning, all is well, and I looked at the count again - it's still right at 50.

 

The only thing I can think of that would be different is that I did a full library refresh yesterday afternoon (for completely separate reasons - it seemed a lot of my metadata & images had gone missing).  Looking at the server log now, the majority of the errors shown in the log I posted previously seem to be gone.

 

I'm not yet ready to say that's the cure here, but others having the issue might want to try and see if that helps.

 

For each of my sections, in Metadata Manager, I did an Advanced Refresh, Refresh all data / Replace Images.  Then after that was complete, ran a refresh in the auto box sets plugin.

Link to comment
Share on other sites

puithove

So after having to reboot the server, this issue is back.  Seems that "good" run may have been a fluke, and I probably should have restarted the service to check it more thoroughly.

 

Same symptoms as before - number of open sockets slow grows (leaks) until it reaches the limit of open files (1024).

 

Oddly enough, I did again observe quite a few items in my library that were missing images like before.  These are items that I KNOW had images downloaded before, had gone missing before, and were re-downloaded when I did the last refresh.  I've refreshed everything again, but the socket leak issue is still happening this time.  Can't say that it's related, but definitely odd.

Link to comment
Share on other sites

Jackrats

Were those items missing images after restarting the service (ie  post restart to free up the used filehandles)? Or did you notice them missing while MB3 was in the "Too many open filehandles" state?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...