Discussion:
WARNING at fs/nfs/write.c:743 nfs_inode_remove_request with -rc6
Will Deacon
2014-09-23 13:03:52 UTC
Permalink
Hi all,

I've been running into the following warning on an arm64 system running
3.17-rc6 with 64k pages. I've been unable to reproduce with a smaller page
size (4k).

I don't yet have a concrete reproducer, but I've seen it hit a few times
today just running a machine with an NFS root filesystem and using ssh.
The warning seems to happen in parallel on the two CPUs, but I'm pretty
confident that our test_and_clear_bit implementation has the relevant
atomic instructions and memory barriers.

Any ideas?

Will

--->8

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1023 at fs/nfs/write.c:743 nfs_inode_remove_request+0xe4/0xf0()
Modules linked in:
CPU: 1 PID: 1023 Comm: kworker/1:2 Not tainted 3.17.0-rc6 #1
Workqueue: nfsiod rpc_async_release
Call trace:
[<fffffe0000096410>] dump_backtrace+0x0/0x130
[<fffffe0000096550>] show_stack+0x10/0x1c
[<fffffe00004cda94>] dump_stack+0x74/0xbc
[<fffffe00000b4d20>] warn_slowpath_common+0x8c/0xb4
[<fffffe00000b4e0c>] warn_slowpath_null+0x14/0x20
[<fffffe000027a6a8>] nfs_inode_remove_request+0xe0/0xf0
[<fffffe000027b704>] nfs_write_completion+0xb4/0x150
[<fffffe0000276ef4>] nfs_pgio_release+0x34/0x44
[<fffffe00004ac2d0>] rpc_free_task+0x24/0x4c
[<fffffe00004ac5c0>] rpc_async_release+0xc/0x18
[<fffffe00000c89e8>] process_one_work+0x140/0x32c
[<fffffe00000c9338>] worker_thread+0x13c/0x470
[<fffffe00000cd9e4>] kthread+0xd0/0xe8
---[ end trace 6f044efb83f0811b ]---

------------[ cut here ]------------
WARNING: CPU: 0 PID: 621 at fs/nfs/write.c:743 nfs_inode_remove_request+0xe4/0xf0()
CPU: 0 PID: 621 Comm: kworker/0:2 Tainted: G W 3.17.0-rc6 #1
Workqueue: nfsiod rpc_async_release
Call trace:
[<fffffe0000096410>] dump_backtrace+0x0/0x130
[<fffffe0000096550>] show_stack+0x10/0x1c
[<fffffe00004cda94>] dump_stack+0x74/0xbc
[<fffffe00000b4d20>] warn_slowpath_common+0x8c/0xb4
[<fffffe00000b4e0c>] warn_slowpath_null+0x14/0x20
[<fffffe000027a6a8>] nfs_inode_remove_request+0xe0/0xf0
[<fffffe000027b704>] nfs_write_completion+0xb4/0x150
[<fffffe0000276ef4>] nfs_pgio_release+0x34/0x44
[<fffffe00004ac2d0>] rpc_free_task+0x24/0x4c
[<fffffe00004ac5c0>] rpc_async_release+0xc/0x18
[<fffffe00000c89e8>] process_one_work+0x140/0x32c
[<fffffe00000c9338>] worker_thread+0x13c/0x470
[<fffffe00000cd9e4>] kthread+0xd0/0xe8
---[ end trace 6f044efb83f0811c ]---
Weston Andros Adamson
2014-09-23 13:33:06 UTC
Permalink
Post by Will Deacon
Hi all,
=20
I've been running into the following warning on an arm64 system runni=
ng
Post by Will Deacon
3.17-rc6 with 64k pages. I've been unable to reproduce with a smaller=
page
Post by Will Deacon
size (4k).
=20
I don't yet have a concrete reproducer, but I've seen it hit a few ti=
mes
Post by Will Deacon
today just running a machine with an NFS root filesystem and using ss=
h.
Post by Will Deacon
The warning seems to happen in parallel on the two CPUs, but I'm pret=
ty
Post by Will Deacon
confident that our test_and_clear_bit implementation has the relevant
atomic instructions and memory barriers.
=20
Any ideas?
=20
Will
So it looks like we=92re either calling nfs_inode_remove_request twice =
on a request,
or somehow not grabbing the inode reference for some request that is in=
the async
write path. It=92s interesting that these come in pairs - that has to m=
ean something!

Any more info on how to reproduce this would be really great. Unfortuna=
tely I don=92t
have access to an arm64 system.

If it=92s possible, could we get a packet trace around when this happen=
s? This is pure
speculation, but this might have something to do the resend path - a co=
mmit fails
and all the requests on the commit list have to be resent.

Have you noticed any side effects from this? That WARN_ON_ONCE was adde=
d
to sanity test the new page group code and we need to fix this, but I=92=
m wondering
if anything =93bad=94 happens=85

-dros
Post by Will Deacon
=20
--->8
=20
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1023 at fs/nfs/write.c:743 nfs_inode_remove_requ=
est+0xe4/0xf0()
Post by Will Deacon
CPU: 1 PID: 1023 Comm: kworker/1:2 Not tainted 3.17.0-rc6 #1
Workqueue: nfsiod rpc_async_release
[<fffffe0000096410>] dump_backtrace+0x0/0x130
[<fffffe0000096550>] show_stack+0x10/0x1c
[<fffffe00004cda94>] dump_stack+0x74/0xbc
[<fffffe00000b4d20>] warn_slowpath_common+0x8c/0xb4
[<fffffe00000b4e0c>] warn_slowpath_null+0x14/0x20
[<fffffe000027a6a8>] nfs_inode_remove_request+0xe0/0xf0
[<fffffe000027b704>] nfs_write_completion+0xb4/0x150
[<fffffe0000276ef4>] nfs_pgio_release+0x34/0x44
[<fffffe00004ac2d0>] rpc_free_task+0x24/0x4c
[<fffffe00004ac5c0>] rpc_async_release+0xc/0x18
[<fffffe00000c89e8>] process_one_work+0x140/0x32c
[<fffffe00000c9338>] worker_thread+0x13c/0x470
[<fffffe00000cd9e4>] kthread+0xd0/0xe8
---[ end trace 6f044efb83f0811b ]---
=20
------------[ cut here ]------------
WARNING: CPU: 0 PID: 621 at fs/nfs/write.c:743 nfs_inode_remove_reque=
st+0xe4/0xf0()
Post by Will Deacon
CPU: 0 PID: 621 Comm: kworker/0:2 Tainted: G W 3.17.0-rc6=
#1
Post by Will Deacon
Workqueue: nfsiod rpc_async_release
[<fffffe0000096410>] dump_backtrace+0x0/0x130
[<fffffe0000096550>] show_stack+0x10/0x1c
[<fffffe00004cda94>] dump_stack+0x74/0xbc
[<fffffe00000b4d20>] warn_slowpath_common+0x8c/0xb4
[<fffffe00000b4e0c>] warn_slowpath_null+0x14/0x20
[<fffffe000027a6a8>] nfs_inode_remove_request+0xe0/0xf0
[<fffffe000027b704>] nfs_write_completion+0xb4/0x150
[<fffffe0000276ef4>] nfs_pgio_release+0x34/0x44
[<fffffe00004ac2d0>] rpc_free_task+0x24/0x4c
[<fffffe00004ac5c0>] rpc_async_release+0xc/0x18
[<fffffe00000c89e8>] process_one_work+0x140/0x32c
[<fffffe00000c9338>] worker_thread+0x13c/0x470
[<fffffe00000cd9e4>] kthread+0xd0/0xe8
---[ end trace 6f044efb83f0811c ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Will Deacon
2014-09-23 13:59:38 UTC
Permalink
I've been running into the following warning on an arm64 system run=
ning
3.17-rc6 with 64k pages. I've been unable to reproduce with a small=
er page
size (4k).
=20
I don't yet have a concrete reproducer, but I've seen it hit a few =
times
today just running a machine with an NFS root filesystem and using =
ssh.
The warning seems to happen in parallel on the two CPUs, but I'm pr=
etty
confident that our test_and_clear_bit implementation has the releva=
nt
atomic instructions and memory barriers.
=20
Any ideas?
=20
So it looks like we=E2=80=99re either calling nfs_inode_remove_reques=
t twice on a request,
or somehow not grabbing the inode reference for some request that is =
in the async
write path. It=E2=80=99s interesting that these come in pairs - that =
has to mean something!

Indeed. I have 6 CPUs on this system too, so it's not a per-cpu thing.
Any more info on how to reproduce this would be really great. Unfortu=
nately I don=E2=80=99t
have access to an arm64 system.
I've not spotted a pattern other than using 64k pages, yet. If I manage=
to
get a reproducer, I'll let you know.
If it=E2=80=99s possible, could we get a packet trace around when thi=
s happens? This is pure
speculation, but this might have something to do the resend path - a =
commit fails
and all the requests on the commit list have to be resent.
Sure, once I can reproduce it reliably, then I'll try to do that.
Have you noticed any side effects from this? That WARN_ON_ONCE was ad=
ded
to sanity test the new page group code and we need to fix this, but I=
=E2=80=99m wondering
if anything =E2=80=9Cbad=E2=80=9D happens=E2=80=A6
I've not noticed anything. In fact, this happened during an LTP run and=
I
didn't see any regressions in the test results.

Will
Will Deacon
2014-09-23 14:53:56 UTC
Permalink
On Tue, Sep 23, 2014 at 02:33:06PM +0100, Weston Andros Adamson wrote=
Any more info on how to reproduce this would be really great. Unfor=
tunately I don=E2=80=99t
have access to an arm64 system.
=20
I've not spotted a pattern other than using 64k pages, yet. If I mana=
ge to
get a reproducer, I'll let you know.
=20
If it=E2=80=99s possible, could we get a packet trace around when t=
his happens? This is pure
speculation, but this might have something to do the resend path - =
a commit fails
and all the requests on the commit list have to be resent.
=20
Sure, once I can reproduce it reliably, then I'll try to do that.
Right, a bunch of DDing from /dev/zero over an ssh session triggers thi=
s
very quickly. I've put a binary tcpdump here:

http://www.willdeacon.ukfsn.org/bitbucket/oopsen/nfs/tcpdump.bin

You may want to filter out the ssh packets. The only `interesting' thin=
g to
me is a retransmission about half way through, but I don't know what I'=
m
looking for.

The NFS server is 10.1.203.204 and the client is 10.1.203.24.

Will
Weston Andros Adamson
2014-09-23 14:59:25 UTC
Permalink
On Tue, Sep 23, 2014 at 02:33:06PM +0100, Weston Andros Adamson wrot=
Any more info on how to reproduce this would be really great. Unfor=
tunately I don=92t
have access to an arm64 system.
=20
I've not spotted a pattern other than using 64k pages, yet. If I man=
age to
get a reproducer, I'll let you know.
=20
If it=92s possible, could we get a packet trace around when this ha=
ppens? This is pure
speculation, but this might have something to do the resend path - =
a commit fails
and all the requests on the commit list have to be resent.
=20
Sure, once I can reproduce it reliably, then I'll try to do that.
=20
Right, a bunch of DDing from /dev/zero over an ssh session triggers t=
his
=20
http://www.willdeacon.ukfsn.org/bitbucket/oopsen/nfs/tcpdump.bin
=20
You may want to filter out the ssh packets. The only `interesting' th=
ing to
me is a retransmission about half way through, but I don't know what =
I'm
looking for.
=20
The NFS server is 10.1.203.204 and the client is 10.1.203.24.
Thanks! You are using NFSv2, so there are no commits - that rules out m=
y hunch.

-dros
Weston Andros Adamson
2014-09-23 15:02:26 UTC
Permalink
=20
On Tue, Sep 23, 2014 at 02:33:06PM +0100, Weston Andros Adamson wro=
Any more info on how to reproduce this would be really great. Unfo=
rtunately I don=92t
have access to an arm64 system.
=20
I've not spotted a pattern other than using 64k pages, yet. If I ma=
nage to
get a reproducer, I'll let you know.
=20
If it=92s possible, could we get a packet trace around when this h=
appens? This is pure
speculation, but this might have something to do the resend path -=
a commit fails
and all the requests on the commit list have to be resent.
=20
Sure, once I can reproduce it reliably, then I'll try to do that.
=20
Right, a bunch of DDing from /dev/zero over an ssh session triggers =
this
=20
http://www.willdeacon.ukfsn.org/bitbucket/oopsen/nfs/tcpdump.bin
=20
You may want to filter out the ssh packets. The only `interesting' t=
hing to
me is a retransmission about half way through, but I don't know what=
I'm
looking for.
=20
The NFS server is 10.1.203.204 and the client is 10.1.203.24.
=20
Thanks! You are using NFSv2, so there are no commits - that rules out=
my hunch.

Wait a second - the whole point of this extra reference (that the WARN_=
ON_ONCE is
related to) is to handle the pass off to commit lists.

Maybe I=92m just doing the wrong thing here with NFSv2!

More soon.

-dros
Weston Andros Adamson
2014-09-23 15:08:36 UTC
Permalink
=20
=20
On Tue, Sep 23, 2014 at 02:33:06PM +0100, Weston Andros Adamson wr=
Any more info on how to reproduce this would be really great. Unf=
ortunately I don=92t
have access to an arm64 system.
=20
I've not spotted a pattern other than using 64k pages, yet. If I m=
anage to
get a reproducer, I'll let you know.
=20
If it=92s possible, could we get a packet trace around when this =
happens? This is pure
speculation, but this might have something to do the resend path =
- a commit fails
and all the requests on the commit list have to be resent.
=20
Sure, once I can reproduce it reliably, then I'll try to do that.
=20
Right, a bunch of DDing from /dev/zero over an ssh session triggers=
this
=20
http://www.willdeacon.ukfsn.org/bitbucket/oopsen/nfs/tcpdump.bin
=20
You may want to filter out the ssh packets. The only `interesting' =
thing to
me is a retransmission about half way through, but I don't know wha=
t I'm
looking for.
=20
The NFS server is 10.1.203.204 and the client is 10.1.203.24.
=20
Thanks! You are using NFSv2, so there are no commits - that rules ou=
t my hunch.
=20
Wait a second - the whole point of this extra reference (that the WAR=
N_ON_ONCE is
related to) is to handle the pass off to commit lists.
=20
Maybe I=92m just doing the wrong thing here with NFSv2!
=20
More soon.
Actually, can you see if this is reproducible with NFSv3?

Thanks,

-dros
Will Deacon
2014-09-23 15:25:58 UTC
Permalink
Post by Weston Andros Adamson
Post by Weston Andros Adamson
Wait a second - the whole point of this extra reference (that the
WARN_ON_ONCE is related to) is to handle the pass off to commit lis=
ts.
Post by Weston Andros Adamson
Post by Weston Andros Adamson
=20
Maybe I=E2=80=99m just doing the wrong thing here with NFSv2!
=20
More soon.
=20
Actually, can you see if this is reproducible with NFSv3?
Just tried that, and I'm unable to trigger the problem with NFSv3.

Will
Will Deacon
2014-09-25 17:15:45 UTC
Permalink
Hi Andros,
On Tue, Sep 23, 2014 at 04:08:36PM +0100, Weston Andros Adamson wrote=
Post by Weston Andros Adamson
Post by Weston Andros Adamson
Wait a second - the whole point of this extra reference (that the
WARN_ON_ONCE is related to) is to handle the pass off to commit l=
ists.
Post by Weston Andros Adamson
Post by Weston Andros Adamson
=20
Maybe I=E2=80=99m just doing the wrong thing here with NFSv2!
=20
More soon.
=20
Actually, can you see if this is reproducible with NFSv3?
=20
Just tried that, and I'm unable to trigger the problem with NFSv3.
Any more ideas? I can run with NFSv3 instead, but it's a shame to leave
v2 broken.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Weston Andros Adamson
2014-09-25 17:27:35 UTC
Permalink
Post by Will Deacon
Hi Andros,
=20
On Tue, Sep 23, 2014 at 04:08:36PM +0100, Weston Andros Adamson wrot=
Post by Weston Andros Adamson
Post by Weston Andros Adamson
Wait a second - the whole point of this extra reference (that the
WARN_ON_ONCE is related to) is to handle the pass off to commit li=
sts.
Post by Will Deacon
Post by Weston Andros Adamson
Post by Weston Andros Adamson
=20
Maybe I=92m just doing the wrong thing here with NFSv2!
=20
More soon.
=20
Actually, can you see if this is reproducible with NFSv3?
=20
Just tried that, and I'm unable to trigger the problem with NFSv3.
=20
Any more ideas? I can run with NFSv3 instead, but it's a shame to lea=
ve
Post by Will Deacon
v2 broken.
Hi Will,

I do plan on looking at this more very soon, but since it=92s v2 only a=
nd just warning
(and not crashing anything) it didn=92t jump to the top of my list. It =
is only printing the
warnings, right? That=92s no reason to switch things up ;)

It might be interesting to get some rpcdebug output around the time of =
the warning,
but I=92m not sure it will be useful.

Since I cannot reproduce this, my next step is to inspect the code with=
an NFSv2
with 64k pages setup in mind.

-dros

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Weston Andros Adamson
2014-10-11 13:32:29 UTC
Permalink
Hey Will,

I=92ve been very busy, but I haven=92t forgotten about your bug report!

I think the WARN_ON_ONCE is just wrong, there are cases where the PG_IN=
ODE_REF
flag is legitimately not set. The flag is set so that sub requests can =
mimmic the parent
request=92s reference count.

Could you please run the reproducer again, unmount, then check the cont=
ents of
/proc/fs/nfsfs/servers?

If it=92s empty like example output below, then we know that requests a=
re being properly
dereferenced.

# cat /proc/fs/nfsfs/servers=20
NV SERVER PORT USE HOSTNAME =20
#

If there are any servers still listed (I=92m assuming this is the only =
mount on your client for
the test), then we have a problem.

-dros
Post by Weston Andros Adamson
=20
Post by Will Deacon
Hi Andros,
=20
On Tue, Sep 23, 2014 at 04:08:36PM +0100, Weston Andros Adamson wro=
Post by Weston Andros Adamson
Post by Weston Andros Adamson
Wait a second - the whole point of this extra reference (that the
WARN_ON_ONCE is related to) is to handle the pass off to commit l=
ists.
Post by Weston Andros Adamson
Post by Will Deacon
Post by Weston Andros Adamson
Post by Weston Andros Adamson
=20
Maybe I=92m just doing the wrong thing here with NFSv2!
=20
More soon.
=20
Actually, can you see if this is reproducible with NFSv3?
=20
Just tried that, and I'm unable to trigger the problem with NFSv3.
=20
Any more ideas? I can run with NFSv3 instead, but it's a shame to le=
ave
Post by Weston Andros Adamson
Post by Will Deacon
v2 broken.
=20
Hi Will,
=20
I do plan on looking at this more very soon, but since it=92s v2 only=
and just warning
Post by Weston Andros Adamson
(and not crashing anything) it didn=92t jump to the top of my list. I=
t is only printing the
Post by Weston Andros Adamson
warnings, right? That=92s no reason to switch things up ;)
=20
It might be interesting to get some rpcdebug output around the time o=
f the warning,
Post by Weston Andros Adamson
but I=92m not sure it will be useful.
=20
Since I cannot reproduce this, my next step is to inspect the code wi=
th an NFSv2
Post by Weston Andros Adamson
with 64k pages setup in mind.
=20
-dros
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Will Deacon
2014-10-20 13:57:37 UTC
Permalink
Post by Weston Andros Adamson
Hey Will,
Hi again, Andros,
Post by Weston Andros Adamson
I=E2=80=99ve been very busy, but I haven=E2=80=99t forgotten about yo=
ur bug report!

No problem, I've been busy too. I just checked with -rc1 and I can repr=
oduce
the issue there too.
Post by Weston Andros Adamson
I think the WARN_ON_ONCE is just wrong, there are cases where the
PG_INODE_REF flag is legitimately not set. The flag is set so that su=
b
Post by Weston Andros Adamson
requests can mimmic the parent request=E2=80=99s reference count.
=20
Could you please run the reproducer again, unmount, then check the co=
ntents of
Post by Weston Andros Adamson
/proc/fs/nfsfs/servers?
Sure. I just tried this and that file is empty after unmounting.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...