Discussion:
[bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies
Heiko Carstens
2014-10-11 08:36:27 UTC
Permalink
Hi all,

it just came to my attention that commit e341694e3eb5
"netlink: Convert netlink_lookup() to use RCU protected hash table"
causes network latencies for me on s390.

The testcase is quite simple and 100% reproducible on s390:

Simply login via ssh to a remote system which has the above mentioned
patch applied. Any action like pressing return now has significant
latencies. Or in other words, working via such a connection becomes
a pain ;)

I haven't debugged it, however I assume the problem is that a) the
commit introduces a synchronize_net() call und b) s390 kernels
usually get compiled with CONFIG_HZ_100 while most other architectures
use CONFIG_HZ_1000.
If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
however I don't consider this a fix...

Another reason why this hasn't been observed on x86 may or may not be
that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
But that's just guessing...
Eric Dumazet
2014-10-11 19:32:44 UTC
Permalink
Post by Heiko Carstens
Hi all,
it just came to my attention that commit e341694e3eb5
"netlink: Convert netlink_lookup() to use RCU protected hash table"
causes network latencies for me on s390.
Simply login via ssh to a remote system which has the above mentioned
patch applied. Any action like pressing return now has significant
latencies. Or in other words, working via such a connection becomes
a pain ;)
I haven't debugged it, however I assume the problem is that a) the
commit introduces a synchronize_net() call und b) s390 kernels
usually get compiled with CONFIG_HZ_100 while most other architectures
use CONFIG_HZ_1000.
If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
however I don't consider this a fix...
Another reason why this hasn't been observed on x86 may or may not be
that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
But that's just guessing...
CC Paul and Sasha
Thomas Graf
2014-10-11 22:25:14 UTC
Permalink
Post by Eric Dumazet
Post by Heiko Carstens
Hi all,
it just came to my attention that commit e341694e3eb5
"netlink: Convert netlink_lookup() to use RCU protected hash table"
causes network latencies for me on s390.
Simply login via ssh to a remote system which has the above mentioned
patch applied. Any action like pressing return now has significant
latencies. Or in other words, working via such a connection becomes
a pain ;)
I haven't debugged it, however I assume the problem is that a) the
commit introduces a synchronize_net() call und b) s390 kernels
usually get compiled with CONFIG_HZ_100 while most other architectures
use CONFIG_HZ_1000.
If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
however I don't consider this a fix...
Another reason why this hasn't been observed on x86 may or may not be
that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
But that's just guessing...
CC Paul and Sasha
I think the issue here is obvious and a fix is on the way to move
the insertion and removal to a worker to no longer require the
synchronize_rcu().

What bothers me is that the synchronize_rcu() should only occur
on expand/shrink and not for every table update. The default table
size is 64.
David Miller
2014-10-11 23:08:29 UTC
Permalink
From: Thomas Graf <***@suug.ch>
Date: Sat, 11 Oct 2014 23:25:14 +0100
Post by Thomas Graf
I think the issue here is obvious and a fix is on the way to move
the insertion and removal to a worker to no longer require the
synchronize_rcu().
What bothers me is that the synchronize_rcu() should only occur
on expand/shrink and not for every table update. The default table
size is 64.
Not true, every netlink socket release incurs a synchronize_net()
now, because we added such a call to netlink_release().

I specifically brought this up to as a possible problem when the
changes went in...
Heiko Carstens
2014-10-20 08:21:08 UTC
Permalink
Post by Thomas Graf
Post by Eric Dumazet
Post by Heiko Carstens
Hi all,
it just came to my attention that commit e341694e3eb5
"netlink: Convert netlink_lookup() to use RCU protected hash table"
causes network latencies for me on s390.
Simply login via ssh to a remote system which has the above mentioned
patch applied. Any action like pressing return now has significant
latencies. Or in other words, working via such a connection becomes
a pain ;)
I haven't debugged it, however I assume the problem is that a) the
commit introduces a synchronize_net() call und b) s390 kernels
usually get compiled with CONFIG_HZ_100 while most other architectures
use CONFIG_HZ_1000.
If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
however I don't consider this a fix...
Another reason why this hasn't been observed on x86 may or may not be
that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
But that's just guessing...
CC Paul and Sasha
I think the issue here is obvious and a fix is on the way to move
the insertion and removal to a worker to no longer require the
synchronize_rcu().
What bothers me is that the synchronize_rcu() should only occur
on expand/shrink and not for every table update. The default table
size is 64.
*ping* ... is there already any patch available?

Loading...