Discussion:
PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)
(too old to reply)
Mirko Parthey
2005-01-31 16:22:02 UTC
Permalink
My Debian machine hangs during shutdown, with messages like this:
unregister_netdevice: waiting for br0 to become free. Usage count = 1

I narrowed it down to the command
# brctl delbr br0
which does not return in the circumstances shown below.

The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
Debian kernel-image-2.6.10-1-686.

My .config is taken from the above mentioned Debian kernel, adapted to
2.6.11-rc2 and the processor type set to Pentium Classic. (I can email
the .config on request).

How to reproduce the problem (I tried this on a Pentium 4 machine):

boot: linux init=/bin/bash
[...booting...]
# mount proc -t proc /proc
# ifconfig lo 127.0.0.1
# brctl addbr br0
# modprobe e100 # also reproducible with 3c5x9
# brctl addif br0 eth0
# ifconfig br0 192.168.1.1
# ifconfig eth0 up
# ifconfig lo down
# lsmod | grep bridge
bridge 48888 0
# brctl delif br0 eth0
# ifconfig br0 down
# brctl delbr br0
unregister_netdevice: waiting for br0 to become free. Usage count = 1
unregister_netdevice: waiting for br0 to become free. Usage count = 1
[...this message again and again, but no progress...]

I'm also surprised that the reference count for the bridge module is
zero, even when it is in use.

Please let me know if you need any further information,
and please Cc me on replies (I am not subscribed to linux-kernel).

Thanks,
Mirko
Matthias-Christian Ott
2005-01-31 17:40:51 UTC
Permalink
Post by Mirko Parthey
unregister_netdevice: waiting for br0 to become free. Usage count = 1
I narrowed it down to the command
# brctl delbr br0
which does not return in the circumstances shown below.
The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
Debian kernel-image-2.6.10-1-686.
My .config is taken from the above mentioned Debian kernel, adapted to
2.6.11-rc2 and the processor type set to Pentium Classic. (I can email
the .config on request).
boot: linux init=/bin/bash
[...booting...]
# mount proc -t proc /proc
# ifconfig lo 127.0.0.1
# brctl addbr br0
# modprobe e100 # also reproducible with 3c5x9
# brctl addif br0 eth0
# ifconfig br0 192.168.1.1
# ifconfig eth0 up
# ifconfig lo down
# lsmod | grep bridge
bridge 48888 0
# brctl delif br0 eth0
# ifconfig br0 down
# brctl delbr br0
unregister_netdevice: waiting for br0 to become free. Usage count = 1
unregister_netdevice: waiting for br0 to become free. Usage count = 1
[...this message again and again, but no progress...]
I'm also surprised that the reference count for the bridge module is
zero, even when it is in use.
Please let me know if you need any further information,
and please Cc me on replies (I am not subscribed to linux-kernel).
Thanks,
Mirko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Hi!
I have a similar problem with my wlan modul, this patch should fix it (I
disables the check function [untested]):

diff -Nru linux-2.6.11-rc2/net/core/dev.c
linux-2.6.11-rc2-ott/net/core/dev.c
--- linux-2.6.11-rc2/net/core/dev.c 2005-01-26 22:27:31.000000000 +0100
+++ linux-2.6.11-rc2-ott/net/core/dev.c 2005-01-31 00:47:34.000000000
+0100
@@ -2974,7 +2974,7 @@
netdev_unregister_sysfs(dev);
dev->reg_state = NETREG_UNREGISTERED;

- netdev_wait_allrefs(dev);
+ //netdev_wait_allrefs(dev);

/* paranoia */
BUG_ON(atomic_read(&dev->refcnt));

Matthias-Christian Ott
Mirko Parthey
2005-02-01 17:36:24 UTC
Permalink
Post by Mirko Parthey
unregister_netdevice: waiting for br0 to become free. Usage count = 1
I narrowed it down to the command
# brctl delbr br0
which does not return in the circumstances shown below.
The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
Debian kernel-image-2.6.10-1-686.
[...]
The problem was introduced between 2.6.8.1 and 2.6.9,
and it is still present in 2.6.11-rc2-bk9.

Mirko
Matthias-Christian Ott
2005-02-02 16:08:01 UTC
Permalink
Post by Mirko Parthey
Post by Mirko Parthey
unregister_netdevice: waiting for br0 to become free. Usage count = 1
I narrowed it down to the command
# brctl delbr br0
which does not return in the circumstances shown below.
The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
Debian kernel-image-2.6.10-1-686.
[...]
The problem was introduced between 2.6.8.1 and 2.6.9,
and it is still present in 2.6.11-rc2-bk9.
Mirko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jep it appears in 2.6.8 and above. I don't know the process who's using it:

Before shutdown:
PID TTY TIME CMD
1 ? 00:00:00 init
2 ? 00:00:00 ksoftirqd/0
3 ? 00:00:00 events/0
4 ? 00:00:00 khelper
9 ? 00:00:00 kthread
19 ? 00:00:00 kacpid
17 ? 00:00:00 vesafb
100 ? 00:00:00 kblockd/0
164 ? 00:00:00 pdflush
165 ? 00:00:00 pdflush
166 ? 00:00:00 kswapd0
167 ? 00:00:00 aio/0
168 ? 00:00:00 jfsIO
169 ? 00:00:00 jfsCommit
170 ? 00:00:00 jfsSync
171 ? 00:00:00 xfslogd/0
172 ? 00:00:00 xfsdatad/0
173 ? 00:00:00 xfsbufd
774 ? 00:00:00 kseriod
862 ? 00:00:00 ata/0
882 ? 00:00:00 kcryptd/0
883 ? 00:00:00 kmirrord/0
887 ? 00:00:00 xfssyncd
939 ? 00:00:00 udevd
1061 ? 00:00:00 devfsd
1242 ? 00:00:00 khubd
1616 ? 00:00:00 syslog-ng
2562 ? 00:00:00 khpsbpkt
2646 ? 00:00:00 knodemgrd_0
3987 ? 00:00:00 crond
4027 tty2 00:00:00 agetty
4028 tty3 00:00:00 agetty
4029 tty4 00:00:00 agetty
4030 tty5 00:00:00 agetty
4031 tty6 00:00:00 agetty
4473 tty1 00:00:00 bash
4553 tty1 00:00:00 ps

And my System can't shutdown because of this bug. Currently disabling
this checking function seems to be the only solution (see my diff).

Matthias-Christian Ott
Matthias-Christian Ott
2005-02-05 12:48:28 UTC
Permalink
Post by Matthias-Christian Ott
Post by Mirko Parthey
Post by Mirko Parthey
unregister_netdevice: waiting for br0 to become free. Usage count = 1
I narrowed it down to the command
# brctl delbr br0
which does not return in the circumstances shown below.
The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
Debian kernel-image-2.6.10-1-686.
[...]
The problem was introduced between 2.6.8.1 and 2.6.9,
and it is still present in 2.6.11-rc2-bk9.
Mirko
-
To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
PID TTY TIME CMD
1 ? 00:00:00 init
2 ? 00:00:00 ksoftirqd/0
3 ? 00:00:00 events/0
4 ? 00:00:00 khelper
9 ? 00:00:00 kthread
19 ? 00:00:00 kacpid
17 ? 00:00:00 vesafb
100 ? 00:00:00 kblockd/0
164 ? 00:00:00 pdflush
165 ? 00:00:00 pdflush
166 ? 00:00:00 kswapd0
167 ? 00:00:00 aio/0
168 ? 00:00:00 jfsIO
169 ? 00:00:00 jfsCommit
170 ? 00:00:00 jfsSync
171 ? 00:00:00 xfslogd/0
172 ? 00:00:00 xfsdatad/0
173 ? 00:00:00 xfsbufd
774 ? 00:00:00 kseriod
862 ? 00:00:00 ata/0
882 ? 00:00:00 kcryptd/0
883 ? 00:00:00 kmirrord/0
887 ? 00:00:00 xfssyncd
939 ? 00:00:00 udevd
1061 ? 00:00:00 devfsd
1242 ? 00:00:00 khubd
1616 ? 00:00:00 syslog-ng
2562 ? 00:00:00 khpsbpkt
2646 ? 00:00:00 knodemgrd_0
3987 ? 00:00:00 crond
4027 tty2 00:00:00 agetty
4028 tty3 00:00:00 agetty
4029 tty4 00:00:00 agetty
4030 tty5 00:00:00 agetty
4031 tty6 00:00:00 agetty
4473 tty1 00:00:00 bash
4553 tty1 00:00:00 ps
And my System can't shutdown because of this bug. Currently disabling
this checking function seems to be the only solution (see my diff).
Matthias-Christian Ott
-
To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Well I think my problem was not ipv6 problem Herbert discribed, it was
syslog-ng. I replaced it with metalog and it seems to work well (I'll
test it with 2.6.11-rc3 today).

Matthias-Christian Ott
Herbert Xu
2005-02-05 05:24:07 UTC
Permalink
Post by Mirko Parthey
boot: linux init=/bin/bash
[...booting...]
# mount proc -t proc /proc
# ifconfig lo 127.0.0.1
# brctl addbr br0
# modprobe e100 # also reproducible with 3c5x9
# brctl addif br0 eth0
# ifconfig br0 192.168.1.1
# ifconfig eth0 up
# ifconfig lo down
This is the key to the problem.

It took me a while to find the cause of this. Along the way I
found a few other ref counting bugs in this area as well.

All of these bugs stem from the idev reference held in rtable/rt6_info.
Looking back in my mailbox, it's amazing how many problems this piece
of infrastructure has caused since it was installed last June. AFAIK
to this day there is still no code in the kernel that actually uses
this infrastructure.

Anyway, this particular problem is due to IPv6 adding local addresses
with split devices. That is, routes to local addresses are added with
rt6i_dev set to &loopback_dev and rt6i_idev set to the idev of the
device where the address is added.

This is just not going to work unless IPv6 creates its own dst garbage
collection routine since the generic GC in net/core/dst.c has no way
of finding all the rt6_info's referring to a specific net_device through
rt6i_idev.

It is also different from the IPv4 behaviour where we set both dev
and idev to loopback_dev.

Now this stuff is apparently going to be used for IP statistics. As
it is packets sent to/received from local addresses are counted against
the loopback device. Linux has behaved this way for a long time. So
when these statistics actually get implemented it will be very strange
if they were accounted to the device owning the local addresses instead
of loopback.

It also goes against the Linux philosophy where the addresses are owned
by the host, not the interface.

Therefore I propose the simple solution of not doing the split device
accounting in rt6_info.

Signed-off-by: Herbert Xu <***@gondor.apana.org.au>

I will send the patches for the other bugs separately.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
David S. Miller
2005-02-05 05:38:13 UTC
Permalink
On Sat, 5 Feb 2005 16:24:07 +1100
Post by Herbert Xu
This is the key to the problem.
...
Post by Herbert Xu
All of these bugs stem from the idev reference held in rtable/rt6_info.
...
Post by Herbert Xu
Anyway, this particular problem is due to IPv6 adding local addresses
with split devices. That is, routes to local addresses are added with
rt6i_dev set to &loopback_dev and rt6i_idev set to the idev of the
device where the address is added.
...
Post by Herbert Xu
It also goes against the Linux philosophy where the addresses are owned
by the host, not the interface.
Therefore I propose the simple solution of not doing the split device
accounting in rt6_info.
I agree with your analysis, however... this change is not sufficient.
You have to then walk over all the uses of rt6i_dev and sanitize the
cases that still expect the split semantics. For example, things like
this piece of coe in rt6_device_match():

if (dev->flags & IFF_LOOPBACK) {
if (sprt->rt6i_idev == NULL ||
sprt->rt6i_idev->dev->ifindex != oif) {
if (strict && oif)
continue;
if (local && (!oif ||
local->rt6i_idev->dev->ifindex == oif))
continue;
}
local = sprt;
}

It is just the first such thing I found, scanning rt6i_idev uses
will easily find several others.
Herbert Xu
2005-02-05 06:11:10 UTC
Permalink
Post by David S. Miller
It is just the first such thing I found, scanning rt6i_idev uses
will easily find several others.
You're right of course. I thought they were all harmless but I was
obviously wrong about this one.

So here is a patch that essentially reverts the split devices
semantics introduced by these two changesets:

[IPV6] addrconf_dst_alloc() to allocate new route for local address.
[IPV6] take rt6i_idev into account when looking up routes.

Signed-off-by: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
David S. Miller
2005-02-05 06:13:44 UTC
Permalink
On Sat, 5 Feb 2005 17:11:10 +1100
Post by Herbert Xu
You're right of course. I thought they were all harmless but I was
obviously wrong about this one.
So here is a patch that essentially reverts the split devices
[IPV6] addrconf_dst_alloc() to allocate new route for local address.
[IPV6] take rt6i_idev into account when looking up routes.
Ok.

But Herbert, let's take a step back real quick because I want
to point something out. IPv6 does try to handle the dangling
mismatched idev's, in route.c:ip6_dst_ifdown(), this is called
via net/core/dst.c:dst_ifdown(), and this releases the ipv6
idev correctly in the split device case.

Did your analysis of this bridging release bug take this into
account? That's why we added this dst->ops method, specifically
to handle this problem.

This was added by Yoshifuji-san in ChangeSet 1.1722.137.17 which
has the checking comment:

[NET]: Add dst->ifdown callback.

Use it to release protocol specific objects that may be
tied to a dst cache object, at ifdown time. Currently
this is used to release ipv4/ipv6 specific device state.
Herbert Xu
2005-02-05 06:46:43 UTC
Permalink
Post by David S. Miller
But Herbert, let's take a step back real quick because I want
to point something out. IPv6 does try to handle the dangling
mismatched idev's, in route.c:ip6_dst_ifdown(), this is called
via net/core/dst.c:dst_ifdown(), and this releases the ipv6
idev correctly in the split device case.
Did your analysis of this bridging release bug take this into
account? That's why we added this dst->ops method, specifically
to handle this problem.
This doesn't work because net/core/dst.c can only search based
on dst->dev. For the split device case, dst->dev is set to
loopback_dev while rt6i_idev is set to the real device.

Therefore net/core/dst.c stands no chance of finding the correct
local address routes that it needs to process.

If we wanted to preserve the split device semantics, then we
can create a local GC list in IPv6 so that it can search based
on rt6i_idev as well as the other keys. Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.

However, IMHO the split device semantics is inconsistent with
the philosphy that addresses belong to the host and not the
interface. So it doesn't really make sense in the current
networking stack.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-05 10:45:59 UTC
Permalink
Post by Herbert Xu
This doesn't work because net/core/dst.c can only search based
on dst->dev. For the split device case, dst->dev is set to
loopback_dev while rt6i_idev is set to the real device.
Although I still think this is a bug, I'm now starting to suspect
that there is another bug around as well.

There is probably an ifp leak which in turn leads to a split dst
leak that allows the first bug to make its mark.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-06 11:41:45 UTC
Permalink
Post by Herbert Xu
Although I still think this is a bug, I'm now starting to suspect
that there is another bug around as well.
There is probably an ifp leak which in turn leads to a split dst
leak that allows the first bug to make its mark.
Found it. This is what happens:

lo goes down =>
rt6_ifdown =>
eth0's local address route gets deleted

eth0 goes down =>
__ipv6_ifa_notify =>
ip6_del_rt fails so we fall through to the
dst_free path. At this point the refcount
taken by __ipv6_ifa_notify is leaked.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 12:30:18 UTC
Permalink
Post by Herbert Xu
Post by Herbert Xu
Although I still think this is a bug, I'm now starting to suspect
that there is another bug around as well.
There is probably an ifp leak which in turn leads to a split dst
leak that allows the first bug to make its mark.
lo goes down =>
rt6_ifdown =>
eth0's local address route gets deleted
eth0 goes down =>
__ipv6_ifa_notify =>
ip6_del_rt fails so we fall through to the
dst_free path. At this point the refcount
taken by __ipv6_ifa_notify is leaked.
Oh, you're right! Thanks.

How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?

Signed-off-by: Hideaki YOSHIFUJI <***@linux-ipv6.org>

===== include/linux/ipv6_route.h 1.6 vs edited =====
--- 1.6/include/linux/ipv6_route.h 2004-10-26 12:54:23 +09:00
+++ edited/include/linux/ipv6_route.h 2005-02-06 21:27:02 +09:00
@@ -26,6 +26,7 @@
#define RTF_FLOW 0x02000000 /* flow significant route */
#define RTF_POLICY 0x04000000 /* policy route */

+#define RTF_ANYCAST 0x40000000
#define RTF_LOCAL 0x80000000

struct in6_rtmsg {
===== net/ipv6/route.c 1.105 vs edited =====
--- 1.105/net/ipv6/route.c 2005-01-15 17:44:48 +09:00
+++ edited/net/ipv6/route.c 2005-02-06 21:26:35 +09:00
@@ -1408,7 +1408,9 @@
rt->u.dst.obsolete = -1;

rt->rt6i_flags = RTF_UP | RTF_NONEXTHOP;
- if (!anycast)
+ if (anycast)
+ rt->rt6i_flags |= RTF_ANYCAST;
+ else
rt->rt6i_flags |= RTF_LOCAL;
rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
if (rt->rt6i_nexthop == NULL) {
@@ -1427,7 +1429,8 @@
static int fib6_ifdown(struct rt6_info *rt, void *arg)
{
if (((void*)rt->rt6i_dev == arg || arg == NULL) &&
- rt != &ip6_null_entry) {
+ rt != &ip6_null_entry &&
+ !(rt->rt6i_flags & (RTF_LOCAL|RTF_ANYCAST))) {
RT6_TRACE("deleted by ifdown %p\n", rt);
return -1;
}
--
Hideaki YOSHIFUJI @ USAGI Project <***@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
Herbert Xu
2005-02-06 21:01:50 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?
Great, that definitely fixes the local address problem.

I'm not sure about anycast routes though. Who's going to delete them
when the real device goes down?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Andre Tomt
2005-02-09 20:45:23 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
Oh, you're right! Thanks.
How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?
This patch seems to work fine here; cleared out the other patches to be
sure. So, herberts first patch fixes it, and so do this one. I have not
tried the one in between.
YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 10:50:39 UTC
Permalink
Post by Herbert Xu
If we wanted to preserve the split device semantics, then we
can create a local GC list in IPv6 so that it can search based
on rt6i_idev as well as the other keys. Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.
Yes, IPv6 needs "split device" semantics
(for per-device statistics such as Ip6InDelivers etc),
and I like later solution.

Thanks.

--yoshfuji
Herbert Xu
2005-02-05 18:32:18 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
Yes, IPv6 needs "split device" semantics
(for per-device statistics such as Ip6InDelivers etc),
and I like later solution.
OK. Is there any reason why IPv4 should be different from IPv6 in
this respect though?

If the split device dst's are to be kept, we'll need a way to clean
them up. There are two choices:

1) Put the dst's on IPv6's own GC so that we can search by rt6i_idev.
2) Change the generic GC so that dst->ops->ifdown is always called even
if dst->dev does not match with the device going down. We also need to
change the individual ifdown functions to check for ->dev. The IPv6
ifdown function can then check for ->rt6i_idev as well.

What's your preference?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
David S. Miller
2005-02-06 04:02:42 UTC
Permalink
On Sat, 05 Feb 2005 19:50:39 +0900 (JST)
Post by YOSHIFUJI Hideaki / 吉藤英明
Post by Herbert Xu
If we wanted to preserve the split device semantics, then we
can create a local GC list in IPv6 so that it can search based
on rt6i_idev as well as the other keys. Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.
Yes, IPv6 needs "split device" semantics
(for per-device statistics such as Ip6InDelivers etc),
and I like later solution.
Ok. I never read whether ipv6, like ipv4, is specified to support
a model of host based ownership of addresses. Does anyone know?
YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 05:01:35 UTC
Permalink
Post by David S. Miller
Post by YOSHIFUJI Hideaki / 吉藤英明
Yes, IPv6 needs "split device" semantics
(for per-device statistics such as Ip6InDelivers etc),
and I like later solution.
Ok. I never read whether ipv6, like ipv4, is specified to support
a model of host based ownership of addresses. Does anyone know?
I'm not sure it is explicitly specified, but there're some hints:

1. we need to allow multiple addresses on multiple interfaces.
e.g. link-local address

2. if a packet has come from A to link-local address on the other side B,
we should not receive it.
+-------+
---->|A B|
+-------+

Currently, it does not happen in usual because ndisc does NOT handle
addresses on other device.

3. mib document states that we should take statistics on interface which
the address belongs to; not the interface where the packet has come
from:

cited from RFC2011bis:
Local (*) packets on the input side are counted on the interface
associated with their destination address, which may not be the
interface on which they were received. This requirement is caused by
the possibility of losing the original interface during processing,
especially re-assembly.

(*): here it means incoming, but not forwarding.


BTW, BSD has similar reference to interface structure in routeing entry.
--
Hideaki YOSHIFUJI @ USAGI Project <***@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
David S. Miller
2005-02-06 04:10:44 UTC
Permalink
On Sat, 5 Feb 2005 17:46:43 +1100
Post by Herbert Xu
This doesn't work because net/core/dst.c can only search based
on dst->dev. For the split device case, dst->dev is set to
loopback_dev while rt6i_idev is set to the real device.
Indeed. I didn't catch that.
Post by Herbert Xu
If we wanted to preserve the split device semantics, then we
can create a local GC list in IPv6 so that it can search based
on rt6i_idev as well as the other keys.
Ok, so this would entail changing each ipv6 dst_free() call
into one to ip6_dst_free(), which would:

ip6_garbage_add(dst);
dst_free(dst);

It would mean that dst_run_gc() would need to have some callback
like dst->ops->gc_destroy() or similar, which would allow ipv6
to delete the dst from it's local garbage list.
Post by Herbert Xu
Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.
I think there is a hole in this idea.... maybe.

If the idea is to scan dst_garbage_list down in ipv6 specific code,
you can't do that since 'dst' objects from every pool in the kernel
get put onto the dst_garbage_list. It is generic.

They have no identity, so it's illegal to treat any member of that
list as an rt_entry, rt6_entry or any specific higher level dst
type.
YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 04:37:23 UTC
Permalink
Post by David S. Miller
Post by Herbert Xu
Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.
I think there is a hole in this idea.... maybe.
If the idea is to scan dst_garbage_list down in ipv6 specific code,
you can't do that since 'dst' objects from every pool in the kernel
get put onto the dst_garbage_list. It is generic.
How about making dst->ops->dev_check() like this:

static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
{
if (dst->ops->dev_check)
return dst->ops->dev_check(dst, dev)
else
return dst->dev == dev;
}

--yoshfuji
David S. Miller
2005-02-06 05:04:11 UTC
Permalink
On Sun, 06 Feb 2005 13:37:23 +0900 (JST)
Post by YOSHIFUJI Hideaki / 吉藤英明
static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
{
if (dst->ops->dev_check)
return dst->ops->dev_check(dst, dev)
else
return dst->dev == dev;
}
Oh I see. That would work, and it seems the simplest, and
lowest risk fix for this problem.

Herbert, what do you think?
YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 05:31:07 UTC
Permalink
Post by David S. Miller
On Sun, 06 Feb 2005 13:37:23 +0900 (JST)
Post by YOSHIFUJI Hideaki / 吉藤英明
static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
{
if (dst->ops->dev_check)
return dst->ops->dev_check(dst, dev)
else
return dst->dev == dev;
}
Oh I see. That would work, and it seems the simplest, and
lowest risk fix for this problem.
Well...

Here, lo is going down.
rt->rt6i_dev = lo and rt->rt6i_idev = ethX.
I think we already see dst->dev == dev (==lo) now.
So, I doubt that fix the problem.

The source of problem is entry (*) which still on routing entry,
not on gc list. And, the owner of entry is not routing table but
unicast/anycast address structure(s).
We need to "kill" active address on the other interfaces.

*: rt->rt6i_dev = lo and rt->rt6i_idev = ethX


BTW, I wish we could shut down eth0 during lo is pending...

--yoshfuji
YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 05:50:07 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
The source of problem is entry (*) which still on routing entry,
not on gc list. And, the owner of entry is not routing table but
unicast/anycast address structure(s).
We need to "kill" active address on the other interfaces.
Which means in addrconf_notiry(), if the dev == &loopback_dev,
call addrconf_ifdown for every device like this:

Signed-off-by: Hideaki YOSHIFUJI <***@linux-ipv6.org>

===== net/ipv6/addrconf.c 1.129 vs edited =====
--- 1.129/net/ipv6/addrconf.c 2005-01-18 06:13:31 +09:00
+++ edited/net/ipv6/addrconf.c 2005-02-06 14:48:25 +09:00
@@ -1961,6 +1961,20 @@
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
/*
+ * If lo is doing down we need to kill
+ * all addresses on the host because it owns
+ * route on lo. --yoshfuji
+ */
+ if (dev == &loopback_dev) {
+ struct net_device *dev1;
+ for (dev1 = dev_base; dev1; dev1 = dev->next) {
+ if (dev1 == &loopback_dev ||
+ __in6_dev_get(dev1) == NULL)
+ continue;
+ addrconf_ifdown(dev1, event != NETDEV_DOWN);
+ }
+ }
+ /*
* Remove all addresses from this interface.
*/
addrconf_ifdown(dev, event != NETDEV_DOWN);
--
Hideaki YOSHIFUJI @ USAGI Project <***@linux-ipv6.org>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
Herbert Xu
2005-02-06 07:00:50 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
Which means in addrconf_notiry(), if the dev == &loopback_dev,
This should fix the reported issue. However, I'm not sure if it's
a good idea to stop all IP traffic when lo goes down. We don't do
that for IPv4.

Besides, we'll still need to fix the rt6i_idev GC issue since the
same bug can occur when eth0 goes down and some appliation is holding
a dst to a local address route. It can become a dead-lock if the
said application then invokes a syscall that takes the RTNL.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-06 06:53:40 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
Here, lo is going down.
rt->rt6i_dev = lo and rt->rt6i_idev = ethX.
I think we already see dst->dev == dev (==lo) now.
So, I doubt that fix the problem.
The source of problem is entry (*) which still on routing entry,
not on gc list. And, the owner of entry is not routing table but
unicast/anycast address structure(s).
We need to "kill" active address on the other interfaces.
*: rt->rt6i_dev = lo and rt->rt6i_idev = ethX
Sorry I don't think this is right. Although lo going down is
required to cause those symptoms, it is not the trigger.

The problem only occurs when eth0 itself is unregistered.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-06 06:52:19 UTC
Permalink
Post by David S. Miller
Oh I see. That would work, and it seems the simplest, and
lowest risk fix for this problem.
Herbert, what do you think?
Yes I agree.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-06 06:51:17 UTC
Permalink
Post by David S. Miller
Post by Herbert Xu
Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.
I think there is a hole in this idea.... maybe.
If the idea is to scan dst_garbage_list down in ipv6 specific code,
you can't do that since 'dst' objects from every pool in the kernel
get put onto the dst_garbage_list. It is generic.
The idea is to move the check into dst->ops->ifdown. By definition
ipv6_dst_ifdown will only see rt6_info entries. So dst_dev_event
will become

for (dst = dst_garbage_list; dst; dst = dst->next) {
dst_ifdown(dst, event != NETDEV_DOWN);
}

dst_ifdown will become

...

do {
if (dst->dev == dev && unregister) {
...
}

dst->ops->ifdown(dst, dev, unregister);
} while ((dst = dst->child) && dst->flags & DST_NOHASH);

...

Note the extra dev argument to ifdown. ipv6_dst_ifdown will be

static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
int how)
{
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;

if (idev != NULL && idev->dev != &loopback_dev && idev->dev == dev) {
struct inet6_dev *loopback_idev = in6_dev_get(&loopback_dev);
if (loopback_idev != NULL) {
rt->rt6i_idev = loopback_idev;
in6_dev_put(idev);
}
}
}

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-08 01:29:29 UTC
Permalink
Post by Herbert Xu
The idea is to move the check into dst->ops->ifdown. By definition
ipv6_dst_ifdown will only see rt6_info entries. So dst_dev_event
will become
Here are the patches to do this. Do they look sane?

This one moves the dst->child processing from dst_ifdown into
xfrm_dst_ifdown.

Signed-off-by: Herbert Xu <***@gondor.apana.org.au>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-02-08 01:31:40 UTC
Permalink
Post by Herbert Xu
This one moves the dst->child processing from dst_ifdown into
xfrm_dst_ifdown.
This patch adds a net_device argument to ifdown. After all,
it's a bit silly to notify someone of an ifdown event without
telling them what which device it was for :)

Signed-off-by: Herbert Xu <***@gondor.apana.org.au>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
David S. Miller
2005-02-15 22:53:48 UTC
Permalink
On Tue, 8 Feb 2005 12:31:40 +1100
Post by Herbert Xu
Post by Herbert Xu
This one moves the dst->child processing from dst_ifdown into
xfrm_dst_ifdown.
This patch adds a net_device argument to ifdown. After all,
it's a bit silly to notify someone of an ifdown event without
telling them what which device it was for :)
Ok, I'm going to try and get these two patches into 2.6.11

Andre Tomt
2005-02-05 11:14:04 UTC
Permalink
Post by Herbert Xu
Post by David S. Miller
It is just the first such thing I found, scanning rt6i_idev uses
will easily find several others.
You're right of course. I thought they were all harmless but I was
obviously wrong about this one.
So here is a patch that essentially reverts the split devices
<..>

This patch fixes my problems with hangs when dot1q VLAN interfaces gets
removed when loopback is down, as reported in the thread "2.6.10
ipv6/8021q lockup on vconfig on interface removal".

Yay :)
YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 11:39:00 UTC
Permalink
Post by Andre Tomt
This patch fixes my problems with hangs when dot1q VLAN interfaces gets
removed when loopback is down, as reported in the thread "2.6.10
ipv6/8021q lockup on vconfig on interface removal".
Please tell me, why your lo is down...

Anyway, if we really want to "fix" this,
we should do in other way.

I think "Make loopback idev stick around" patches
(for IPv4 and IPv6) could be start of that.

--yoshfuji
Andre Tomt
2005-02-05 11:48:15 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
Post by Andre Tomt
This patch fixes my problems with hangs when dot1q VLAN interfaces gets
removed when loopback is down, as reported in the thread "2.6.10
ipv6/8021q lockup on vconfig on interface removal".
Please tell me, why your lo is down...
Anyway, if we really want to "fix" this,
we should do in other way.
I think "Make loopback idev stick around" patches
(for IPv4 and IPv6) could be start of that.
"ifdown -a" gets run on shutdown and reboot here, and ifdown -a in
Debian brings down loopback before any other interfaces.
YOSHIFUJI Hideaki / 吉藤英明
2005-02-05 11:55:07 UTC
Permalink
Post by Andre Tomt
Post by YOSHIFUJI Hideaki / 吉藤英明
Please tell me, why your lo is down...
"ifdown -a" gets run on shutdown and reboot here, and ifdown -a in
Debian brings down loopback before any other interfaces.
Okay, thanks. (I now remember someone told me this before.) hmm...

--yoshfuji
Herbert Xu
2005-02-05 18:33:46 UTC
Permalink
Post by YOSHIFUJI Hideaki / 吉藤英明
I think "Make loopback idev stick around" patches
(for IPv4 and IPv6) could be start of that.
Unfortunately that patch can't fix this particular problem. This
problem will show up whenever there is a dst on the GC list that
has split devices and a non-zero refcnt.

So if you had a process holding that dst you can still get a dead-lock.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Andre Tomt
2005-02-06 10:55:19 UTC
Permalink
Post by Herbert Xu
Post by David S. Miller
It is just the first such thing I found, scanning rt6i_idev uses
will easily find several others.
You're right of course. I thought they were all harmless but I was
obviously wrong about this one.
So here is a patch that essentially reverts the split devices
[IPV6] addrconf_dst_alloc() to allocate new route for local address.
[IPV6] take rt6i_idev into account when looking up routes.
Now that this fix have been written off as probably wrong; how much does
it break? As far as I've understood it "just" reverts to old semantics,
probably not correct semantics, but still not unexpected semantics
(like, say, hang on device unregistration ;) )

I'm contemplating just using it as a quick-fix until 2.6.11 to get this
problem under control.
YOSHIFUJI Hideaki / 吉藤英明
2005-02-06 11:28:13 UTC
Permalink
Post by Andre Tomt
I'm contemplating just using it as a quick-fix until 2.6.11 to get this
problem under control.
Would you find if my patch works? Thanks.

--yoshfuji
Continue reading on narkive:
Loading...