Discussion:
silent semantic changes with reiser4
(too old to reply)
Christoph Hellwig
2004-08-24 20:25:21 UTC
Permalink
After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems and have a slight potential to break
even in the kernel.

o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
- meaning of the -x permission. This one has different meanings on
directories vs files on UNIX systems. If we want to support
directories as files we'll probably have to find a way to work
around this.
- dentry aliasing. I can't find a formal guarantee in the code this
can't happen

o metafiles - ..metas as a magic name that's just taken out of the
namespace doesn't sound like a good idea. If we want this it should
be a VFS-level option and there should be a translation-layer to
xattrs. Not doing this will again confuse applications greatly that
expect uniform filesystem behaviour.

Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lee Revell
2004-08-24 20:35:18 UTC
Permalink
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2004-08-24 20:38:44 UTC
Permalink
Post by Lee Revell
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?
the find I have here is using lstat and not open with O_DIRECTORY, so
no.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lee Revell
2004-08-24 20:42:08 UTC
Permalink
Post by Christoph Hellwig
Post by Lee Revell
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?
the find I have here is using lstat and not open with O_DIRECTORY, so
no.
Ugh, how embarrassing, I completely forgot about stat().

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jamie Lokier
2004-08-24 21:18:35 UTC
Permalink
Post by Christoph Hellwig
Post by Lee Revell
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?
the find I have here is using lstat and not open with O_DIRECTORY, so
no.
The find-like program I use (called treescan) uses O_DIRECTORY as an
optimisation. It assumes that O_DIRECTORY will only open objects
which are directories and can be read using readdir().

However, if reiser4 returns d_type values, then it won't even attempt
an open on non-DT_DIR objects, and that's a better optimisation.
(reiserfs doesn't return d_type values, unfortunately).

So the list of files that treescan finds depends on whether reiser4
implements d_type.

This is nothing like a POSIX filesystem. You untar a tree, and then
listing it recursively shows extra things created by reiser4.

I quite like the principle, but because it's not like POSIX and
doesn't match some program's expectations, it's a problem in its
present form.

xattrs aren't a complete solution as you can't store structured data
in an xattr. For example, with reiser4's model, you can cd into a
.tar, .zip, .mp3 or .xml file and list the internal structure along
with the file's metadata. You can't do that with xattrs.

Programs exist which quite reasonably assume that when you create a
file, you can't opendir() the file, and recursive listings (like find,
ls -R et al.) won't automatically traverse into every file.

On the other hand, being able to enter a file in a directory-like way
allows structured representations of the contents to be accessed in
the very useful "everything's a file" way -- i.e. ordinary tools.

So here's a semantic proposal:

1. O_DIRECTORY won't open an ordinary file.
Corollary: opendir("file") won't open an ordinary file.

2. An ordinary file path followed by "/" won't open an ordinary file.
Corollary: opendir("file/") won't open an ordinary file.

This is because appending a trailing slash is an alternate
way for userspace to get the same results as O_DIRECTORY.

3. An ordinary file path followed by "/" _and_ one or more path
components will open the file as a directory and enter it.
Corollary: opendir("file/.") will open an ordinary file.

4. The type of "file/." shall be S_IFDIR, _not_ S_IFREG.
Corollary: stat("file/.") will return that it's a directory.

The intention here is that explicit requests to examine the
metadata or alternate structure representations of a file will
create such a view, but the view is only available if requested
explicitly.

When such a view is created, the results of stat(), O_DIRECTORY
and opendir() are absolutely consistent. This will minimise
confusion. Programs which recurse over a directory tree won't
look inside any of the files. However they can be explicitly
asked to recurse starting from a path inside a file: then
they'll recurse over a single file's metadata and structured data.

Regarding the problems of safe locking in the VFS. The VFS assumes
that directories are not hard linked: i.e. that they cannot appear at
more than one path in a filesystem. Files-as-directories breaks that.

However, VFS does support directories on multiple paths, using bind mounts.

So it wouldn't be out of the question if entering a file (as described
above) effectively auto-mounted a bind mount at that point.


-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jeff Garzik
2004-08-24 20:38:25 UTC
Permalink
Post by Christoph Hellwig
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
Ouch.

I would definitely classify this as a security hole, since userland
definitely uses O_DIRECTORY to avoid races.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
v***@parcelfarce.linux.theplanet.co.uk
2004-08-24 20:53:44 UTC
Permalink
Post by Jeff Garzik
Post by Christoph Hellwig
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
Ouch.
I would definitely classify this as a security hole, since userland
definitely uses O_DIRECTORY to avoid races.
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
v***@parcelfarce.linux.theplanet.co.uk
2004-08-24 21:22:32 UTC
Permalink
Post by v***@parcelfarce.linux.theplanet.co.uk
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
While we are at it - consider these hybrids vetoed until
a) sys_link()/sys_link() deadlock is fixed
b) sys_link()/sys_rename() deadlock is fixed
c) correctness proof of the locking scheme (in
Documentation/filesystems/directory-locking) is updated to match the
presense of the file/directory hybrids.

Rationale: (a) and (b) - immediately exploitable by any user, (c) - "convince
us that there's no more crap of that kind". IMO a reasonable request, seeing
that the first look at the patches in -mm4 had turned up two exploits in
that area, despite the *YEARS* of warnings about potential trouble and need
to be careful there (actually, I've given Hans too much credit and assumed
that link/link never happens since nobody would be dumb enough to provide
->link() method for non-directory inodes; turns out that somebody is dumb
enough and link/link is as exploitable as link/rename).
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hans Reiser
2004-08-25 18:28:56 UTC
Permalink
I allowed myself to get talked out of a final top to bottom code audit,
and obviously that was a mistake.

It will probably take about 6 weeks. Apologies for wasting your time
before that was done.

Hans
Post by v***@parcelfarce.linux.theplanet.co.uk
Post by v***@parcelfarce.linux.theplanet.co.uk
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
While we are at it - consider these hybrids vetoed until
a) sys_link()/sys_link() deadlock is fixed
b) sys_link()/sys_rename() deadlock is fixed
c) correctness proof of the locking scheme (in
Documentation/filesystems/directory-locking) is updated to match the
presense of the file/directory hybrids.
Rationale: (a) and (b) - immediately exploitable by any user, (c) - "convince
us that there's no more crap of that kind". IMO a reasonable request, seeing
that the first look at the patches in -mm4 had turned up two exploits in
that area, despite the *YEARS* of warnings about potential trouble and need
to be careful there (actually, I've given Hans too much credit and assumed
that link/link never happens since nobody would be dumb enough to provide
->link() method for non-directory inodes; turns out that somebody is dumb
enough and link/link is as exploitable as link/rename).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2004-08-25 18:45:23 UTC
Permalink
Post by Hans Reiser
I allowed myself to get talked out of a final top to bottom code audit,
and obviously that was a mistake.
It will probably take about 6 weeks. Apologies for wasting your time
before that was done.
I don't think you'll get anywhere with auditing. We need to write down
the semantics you want, define them at the VFS level and make sure
they're not conflicting with defined userspace semantics or kernel
assumptions.

I think you need to learn the basic distinction between the VFS layer
and a lowlevel filesystem driver.
Hans Reiser
2004-08-25 19:53:28 UTC
Permalink
I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus. Sigh.

Dear Christoph,

Let me see if I can summarize what you and your contingent are saying,
and if I misconstrue anything, let me know.;-)

You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles. When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.

Making files into directories caused only two applications out of the
entire OS to notice the change, and that was because of a bug in what
error code we returned that we are going to fix. You think that was a
disaster; I think it was a triumph.

Now a cleanly architected filesystem with no attributes and just files
and directories that can do everything attributes are used for exists.
You don't want it to have the competitive advantage. Instead, you want
it to have its clean design excised until you have something that
duplicates it ready to go, and only then should it be allowed that users
will use the features of your competitor's filesystem which you
disdained implementing for so long.

Since you never studied or understood namespace design principles (or
you would not have created and supported xattrs), you want to rename it
to be called VFS, rewrite what we have done, and take over as the
maintainer, mangling its design in a committee clusterfuck as you go.

We have just implemented very trivial semantic enhancements of the FS
namespace, nothing like as ambitious as www.namesys.com/whitepaper.html
or WinFS, and you are already pissing your pants.

Is that a fair summary?

Eat my dust.

Hans

PS

I should of course qualify what I have said. The use of files and
directories in place of attributes is not a finished work. It has bugs,
sys_reiser4() does not yet work, and there are little features still
missing like having files readdir ignores.

Still, except for the bugs, what we have is usable, and there are a lot
of happy reiser4 users right now even with the bugs. It will need a
little bit more time, and then all the pieces will be in place.

PPS

If you implement your filesystems as reiser4 plugins, and rename
reiser4's plugin code to be called "vfs", your filesystems will go
faster. Not as fast as reiser4 though, because it has a better layout
and that affects performance a lot, but faster is faster.... See
www.namesys.com/benchmarks.html for details.

PPPS

Since we have such a performance lead, Namesys is about to change its
focus from the storage layer to semantics, look at
www.namesys.com/whitepaper.html for details. Semantic enhancements are
the important stuff, and finally Namesys is where we have all the
storage layer prerequisites done right, and the real work can begin.
The gap between us is about to widen further.
Post by Christoph Hellwig
After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems
if we leave you in the dust, run faster.... not my problem....
Post by Christoph Hellwig
Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
If you can't help fight WinFS, then get out of the way. Namesys is on
the march. Read www.namesys.com/whitepaper.html.

Or, be smart, recognize that reiser4 is faster and more flexible than
your storage layers because we are older and wiser and worked harder at
it, join the team, and start contributing plugins that tap into the
higher performance it offers.

Microsoft tried to build a storage layer that could handle small objects
without losing performance, failed, and gave up at considerable cost to
their architecture and pocketbook.

We just broke a hole in the enemy line. You could come swarming through
it with us, but it sounds like you prefer complaining to HQ that we are
getting too far in front of you.
Matthew Wilcox
2004-08-25 20:06:48 UTC
Permalink
That's a nice marketing talk. Get back to us when you have some technical
contribution to make.
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
Christoph Hellwig
2004-08-25 20:08:59 UTC
Permalink
Post by Hans Reiser
You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles.
Actually in most of the discussion you simply didn't participate. While
xattrs might not be the nicest interface they have the advantag of not
breaking the SuS assumption of what directories vs files are, and they
do not break the Linux O_DIRECTORY semantics that are defined and need
to solve real-world races either.
Post by Hans Reiser
When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.
Post by Hans Reiser
Now a cleanly architected filesystem with no attributes and just files
and directories that can do everything attributes are used for exists.
You don't want it to have the competitive advantage. Instead, you want
it to have its clean design excised until you have something that
duplicates it ready to go, and only then should it be allowed that users
will use the features of your competitor's filesystem which you
disdained implementing for so long.
My competitors filesystem? If you look at MAINTAINERS I maintain only
vxfs and sysvfs, neither of which I'd suggest anyone to run their system
on.
Post by Hans Reiser
Since you never studied or understood namespace design principles (or
you would not have created and supported xattrs), you want to rename it
to be called VFS, rewrite what we have done, and take over as the
maintainer, mangling its design in a committee clusterfuck as you go.
Hans, please stop the personal crap or the black helicopters will kidnap
you. When was the last time you actually worked on kernel namespace
code instead of talking marketing bullshit and ignoring all real world
problems.
Post by Hans Reiser
If you implement your filesystems as reiser4 plugins, and rename
reiser4's plugin code to be called "vfs", your filesystems will go
faster. Not as fast as reiser4 though, because it has a better layout
and that affects performance a lot, but faster is faster.... See
www.namesys.com/benchmarks.html for details.
Could you pass on that crack pipe please?
Christoph Hellwig
2004-08-25 20:19:29 UTC
Permalink
Btw, I just got reminded you might take my saying as an "piss of you're
idea stinks" or similar things. So let me clarify again the actual
technical and project managment issues another time before we start
getting really personal :)

Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem. Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.

So now go on and try to work together with the other peope doing VFS
level work instead of hiding, or if you think you can't work together
with us search a nice research OS where you can take over the VFS layer,
if your ideas prove to be good I'm sure Linux will pick them up sooner
or later.

Christoph
Linus Torvalds
2004-08-25 20:24:36 UTC
Permalink
Post by Christoph Hellwig
Over the last at least five years we've taken as much as possible
semantics out of the filesystems and into the VFS layer, thus having
a separation between the semantical layer (VFS) and the low level
filesystem. Your attributes are absoultely a VFS thing and as such
should not happen at the filesystem layer, and no, that doesn't mean
they're bad per se, I just think they are a rather bad fit for Linux.
Now this I agree with, in the sense that I think that if we want to
support this, it should be supported at a VFS layer.

On the other hand, I think doing it inside the filesystem with ugly hacks
is an acceptable way to prototype the idea before it's been proven to
really be workable. Maybe it has more problems with legacy apps than we'd
expect..

Linus
Christoph Hellwig
2004-08-25 20:25:39 UTC
Permalink
Post by Linus Torvalds
Now this I agree with, in the sense that I think that if we want to
support this, it should be supported at a VFS layer.
On the other hand, I think doing it inside the filesystem with ugly hacks
is an acceptable way to prototype the idea before it's been proven to
really be workable. Maybe it has more problems with legacy apps than we'd
expect..
Oh, I'm the last person to tell anyone how to prototype things. I just
don't want such inconsistancies in the mainline kernel.
Linus Torvalds
2004-08-25 20:22:55 UTC
Permalink
Post by Christoph Hellwig
For one thing _I_ didn't decide about xattrs anyway. And I still
haven't seen a design from you on -fsdevel how you try to solve the
problems with files as directories.
Hey, files-as-directories are one of my pet things, so I have to side with
Hans on this one. I think it just makes sense. A hell of a lot more sense
than xattrs, anyway, since it allows scripts etc standard tools to touch
the attributes.

It's the UNIX way.

And yes, the semantics can _easily_ be solved in very unixy ways.

One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what
you do is:

- without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
- with the slash, it won't open _without_ O_DIRECTORY (EISDIR)

Problem solved. Very user-friendly, and very intuitive.

Will it potentially break something? Sure. Do we care? Me, I'll take that
kind of extension _any_ day over xattrs, that are fundamentally flawed in
my opinion and totally useless. The argument that applications like "tar"
won't understand the file-as-directory thing is _flawed_, since legacy
apps won't understand xattrs either.

Oh, add a O_NOXATTRS flag to force a path lookup to only use regular
directories, the same way we have O_NOFOLLOW and friends. That allows
people to see the difference, if they care (ie a file server might decide
that it doesn't want to expose things like this).

I never liked the xattr stuff. It makes little sense, and is totally
useless for 99.9999% of everything. I still don't see the point of it,
except for samba. Ugly.

Linus
Christoph Hellwig
2004-08-25 20:35:49 UTC
Permalink
Post by Linus Torvalds
And yes, the semantics can _easily_ be solved in very unixy ways.
One way to solve it is to just realize that a final slash at the end
implies pretty strongly that you want to treat it as a directory. So what
- without the slash, a file-as-dir won't open with O_DIRECTORY (ENOTDIR)
- with the slash, it won't open _without_ O_DIRECTORY (EISDIR)
Problem solved. Very user-friendly, and very intuitive.
That would solve the O_DIRECTORY issue, the dentry aliasing still needs
work though with the semantics for link/unlink/rename.

Maybe Hans & you should start 2.7 to work this out? :)

Jeremy Allison
2004-08-25 20:20:22 UTC
Permalink
Post by Hans Reiser
You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles. When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.
I don't want to comment on any of the technical issues about VFS etc. as
I would be completely out of my depth, however I do want to say 2 things. Firstly,
this is a feature that Samba users have been needing for many years to maintain
compatibility with NTFS and Windows clients. Microsoft no longer sell any servers
or clients without support for multiple data streams per file, and their latest
XP SP2 code *does* use this feature. Whatever the kernel issues I'm really glad
that Hans and Namesys have created something we can use to match this
functionality - soon we will need it in order to be able to exist in
a Microsoft client-dominated world.

My second point is the following. Hans - did you *really* have to reinvent
the wheel w.r.t userspace API calls ? Did you look at this work (done in 2001
for Solaris) ?

http://bama.ua.edu/cgi-bin/man-cgi?fsattr+5
http://bama.ua.edu/cgi-bin/man-cgi?attropen+3C
http://bama.ua.edu/cgi-bin/man-cgi?openat+2

I'm complaining here as someone who will have to write portable code
to try and work on all these "files with streams" systems.

Jeremy.
Chris Mason
2004-08-25 20:22:14 UTC
Permalink
Post by Hans Reiser
I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus. Sigh.
Dear Christoph,
Let me see if I can summarize what you and your contingent are saying,
and if I misconstrue anything, let me know.;-)
Just for fun why don't we look at the way things are today:

1) reiser4 has semantics that do belong at the VFS level. They weren't
implemented at the VFS level for a variety of reasons, none of which
really matter right now.

2) new kernel patches that fragment the application developers between
apis are a bad thing. There does need to be one interface here, and it
is in Hans' best interest to unify his work by working with people to
introduce new kernel wide apis.

This starts with exactly what Christoph described in writing a short
summary of how you want things to work today. Since we can't resist,
we'll also go ahead and rehash all the old flame wars over this, but try
to include some new ideas about where you want to see the reiser4
interfaces in 6 months as well.

-chris
Continue reading on narkive:
Loading...