Discussion:
silent semantic changes with reiser4
Christoph Hellwig
2004-08-24 20:25:21 UTC
Permalink
After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems and have a slight potential to break
even in the kernel.

o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
- meaning of the -x permission. This one has different meanings on
directories vs files on UNIX systems. If we want to support
directories as files we'll probably have to find a way to work
around this.
- dentry aliasing. I can't find a formal guarantee in the code this
can't happen

o metafiles - ..metas as a magic name that's just taken out of the
namespace doesn't sound like a good idea. If we want this it should
be a VFS-level option and there should be a translation-layer to
xattrs. Not doing this will again confuse applications greatly that
expect uniform filesystem behaviour.

Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lee Revell
2004-08-24 20:35:18 UTC
Permalink
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2004-08-24 20:38:44 UTC
Permalink
Post by Lee Revell
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?
the find I have here is using lstat and not open with O_DIRECTORY, so
no.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lee Revell
2004-08-24 20:42:08 UTC
Permalink
Post by Christoph Hellwig
Post by Lee Revell
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?
the find I have here is using lstat and not open with O_DIRECTORY, so
no.
Ugh, how embarrassing, I completely forgot about stat().

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jamie Lokier
2004-08-24 21:18:35 UTC
Permalink
Post by Christoph Hellwig
Post by Lee Revell
Post by Christoph Hellwig
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
So `find -type d' would list every file on the system?
the find I have here is using lstat and not open with O_DIRECTORY, so
no.
The find-like program I use (called treescan) uses O_DIRECTORY as an
optimisation. It assumes that O_DIRECTORY will only open objects
which are directories and can be read using readdir().

However, if reiser4 returns d_type values, then it won't even attempt
an open on non-DT_DIR objects, and that's a better optimisation.
(reiserfs doesn't return d_type values, unfortunately).

So the list of files that treescan finds depends on whether reiser4
implements d_type.

This is nothing like a POSIX filesystem. You untar a tree, and then
listing it recursively shows extra things created by reiser4.

I quite like the principle, but because it's not like POSIX and
doesn't match some program's expectations, it's a problem in its
present form.

xattrs aren't a complete solution as you can't store structured data
in an xattr. For example, with reiser4's model, you can cd into a
.tar, .zip, .mp3 or .xml file and list the internal structure along
with the file's metadata. You can't do that with xattrs.

Programs exist which quite reasonably assume that when you create a
file, you can't opendir() the file, and recursive listings (like find,
ls -R et al.) won't automatically traverse into every file.

On the other hand, being able to enter a file in a directory-like way
allows structured representations of the contents to be accessed in
the very useful "everything's a file" way -- i.e. ordinary tools.

So here's a semantic proposal:

1. O_DIRECTORY won't open an ordinary file.
Corollary: opendir("file") won't open an ordinary file.

2. An ordinary file path followed by "/" won't open an ordinary file.
Corollary: opendir("file/") won't open an ordinary file.

This is because appending a trailing slash is an alternate
way for userspace to get the same results as O_DIRECTORY.

3. An ordinary file path followed by "/" _and_ one or more path
components will open the file as a directory and enter it.
Corollary: opendir("file/.") will open an ordinary file.

4. The type of "file/." shall be S_IFDIR, _not_ S_IFREG.
Corollary: stat("file/.") will return that it's a directory.

The intention here is that explicit requests to examine the
metadata or alternate structure representations of a file will
create such a view, but the view is only available if requested
explicitly.

When such a view is created, the results of stat(), O_DIRECTORY
and opendir() are absolutely consistent. This will minimise
confusion. Programs which recurse over a directory tree won't
look inside any of the files. However they can be explicitly
asked to recurse starting from a path inside a file: then
they'll recurse over a single file's metadata and structured data.

Regarding the problems of safe locking in the VFS. The VFS assumes
that directories are not hard linked: i.e. that they cannot appear at
more than one path in a filesystem. Files-as-directories breaks that.

However, VFS does support directories on multiple paths, using bind mounts.

So it wouldn't be out of the question if entering a file (as described
above) effectively auto-mounted a bind mount at that point.


-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jeff Garzik
2004-08-24 20:38:25 UTC
Permalink
Post by Christoph Hellwig
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
Ouch.

I would definitely classify this as a security hole, since userland
definitely uses O_DIRECTORY to avoid races.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
v***@parcelfarce.linux.theplanet.co.uk
2004-08-24 20:53:44 UTC
Permalink
Post by Jeff Garzik
Post by Christoph Hellwig
o files as directories
- O_DIRECTORY opens succeed on all files on reiser4. Besides breaking
.htaccess handling in apache and glibc compilation this also renders
this flag entirely useless and opens up the races it tries to
prevent against cmpletely useless
Ouch.
I would definitely classify this as a security hole, since userland
definitely uses O_DIRECTORY to avoid races.
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
v***@parcelfarce.linux.theplanet.co.uk
2004-08-24 21:22:32 UTC
Permalink
Post by v***@parcelfarce.linux.theplanet.co.uk
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
While we are at it - consider these hybrids vetoed until
a) sys_link()/sys_link() deadlock is fixed
b) sys_link()/sys_rename() deadlock is fixed
c) correctness proof of the locking scheme (in
Documentation/filesystems/directory-locking) is updated to match the
presense of the file/directory hybrids.

Rationale: (a) and (b) - immediately exploitable by any user, (c) - "convince
us that there's no more crap of that kind". IMO a reasonable request, seeing
that the first look at the patches in -mm4 had turned up two exploits in
that area, despite the *YEARS* of warnings about potential trouble and need
to be careful there (actually, I've given Hans too much credit and assumed
that link/link never happens since nobody would be dumb enough to provide
->link() method for non-directory inodes; turns out that somebody is dumb
enough and link/link is as exploitable as link/rename).
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hans Reiser
2004-08-25 18:28:56 UTC
Permalink
I allowed myself to get talked out of a final top to bottom code audit,
and obviously that was a mistake.

It will probably take about 6 weeks. Apologies for wasting your time
before that was done.

Hans
Post by v***@parcelfarce.linux.theplanet.co.uk
Post by v***@parcelfarce.linux.theplanet.co.uk
Feh. That's far from the worst parts of the mess introduced by "hybrid"
crap - trivial sys_link(2) deadlocks triggerable by any user rate a bit
higher on the suckitude scale, IMO.
While we are at it - consider these hybrids vetoed until
a) sys_link()/sys_link() deadlock is fixed
b) sys_link()/sys_rename() deadlock is fixed
c) correctness proof of the locking scheme (in
Documentation/filesystems/directory-locking) is updated to match the
presense of the file/directory hybrids.
Rationale: (a) and (b) - immediately exploitable by any user, (c) - "convince
us that there's no more crap of that kind". IMO a reasonable request, seeing
that the first look at the patches in -mm4 had turned up two exploits in
that area, despite the *YEARS* of warnings about potential trouble and need
to be careful there (actually, I've given Hans too much credit and assumed
that link/link never happens since nobody would be dumb enough to provide
->link() method for non-directory inodes; turns out that somebody is dumb
enough and link/link is as exploitable as link/rename).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig
2004-08-25 18:45:23 UTC
Permalink
Post by Hans Reiser
I allowed myself to get talked out of a final top to bottom code audit,
and obviously that was a mistake.
It will probably take about 6 weeks. Apologies for wasting your time
before that was done.
I don't think you'll get anywhere with auditing. We need to write down
the semantics you want, define them at the VFS level and make sure
they're not conflicting with defined userspace semantics or kernel
assumptions.

I think you need to learn the basic distinction between the VFS layer
and a lowlevel filesystem driver.
Hans Reiser
2004-08-26 09:02:29 UTC
Permalink
Post by Christoph Hellwig
I don't think you'll get anywhere with auditing. We need to write down
the semantics you want, define them at the VFS level and make sure
they're not conflicting with defined userspace semantics or kernel
assumptions.
I think you need to learn the basic distinction between the VFS layer
and a lowlevel filesystem driver.
How old are you? I thought you were the guy at Linux Tag with fashion
oriented hair who gave a talk on his XFS work? Did I confuse you with
someone else?


Hans
Hans Reiser
2004-08-25 19:53:28 UTC
Permalink
I had not intended to respond to this because I have nothing positive to
say, but Andrew said I needed to respond and suggested I should copy
Linus. Sigh.

Dear Christoph,

Let me see if I can summarize what you and your contingent are saying,
and if I misconstrue anything, let me know.;-)

You ignored everything I said during the discussion of xattrs about how
there is no need to have attributes when you can just have files and
directories, and that xattrs reflected a complete ignorance of name
space design principles. When I said we should just add some nice
optional features to files and directories so that they can do
everything that attributes can do if they are used that way, you just
didn't get it. You instead went for the quick ugly hack called xattrs.
You then got that ugly hack done first, because quick hacks are, well,
quick. I then went about doing it the right way for Reiser4, and got
DARPA to fund doing it. I was never silent about it.

Making files into directories caused only two applications out of the
entire OS to notice the change, and that was because of a bug in what
error code we returned that we are going to fix. You think that was a
disaster; I think it was a triumph.

Now a cleanly architected filesystem with no attributes and just files
and directories that can do everything attributes are used for exists.
You don't want it to have the competitive advantage. Instead, you want
it to have its clean design excised until you have something that
duplicates it ready to go, and only then should it be allowed that users
will use the features of your competitor's filesystem which you
disdained implementing for so long.

Since you never studied or understood namespace design principles (or
you would not have created and supported xattrs), you want to rename it
to be called VFS, rewrite what we have done, and take over as the
maintainer, mangling its design in a committee clusterfuck as you go.

We have just implemented very trivial semantic enhancements of the FS
namespace, nothing like as ambitious as www.namesys.com/whitepaper.html
or WinFS, and you are already pissing your pants.

Is that a fair summary?

Eat my dust.

Hans

PS

I should of course qualify what I have said. The use of files and
directories in place of attributes is not a finished work. It has bugs,
sys_reiser4() does not yet work, and there are little features still
missing like having files readdir ignores.

Still, except for the bugs, what we have is usable, and there are a lot
of happy reiser4 users right now even with the bugs. It will need a
little bit more time, and then all the pieces will be in place.

PPS

If you implement your filesystems as reiser4 plugins, and rename
reiser4's plugin code to be called "vfs", your filesystems will go
faster. Not as fast as reiser4 though, because it has a better layout
and that affects performance a lot, but faster is faster.... See
www.namesys.com/benchmarks.html for details.

PPPS

Since we have such a performance lead, Namesys is about to change its
focus from the storage layer to semantics, look at
www.namesys.com/whitepaper.html for details. Semantic enhancements are
the important stuff, and finally Namesys is where we have all the
storage layer prerequisites done right, and the real work can begin.
The gap between us is about to widen further.
Post by Christoph Hellwig
After looking trough the code and mailinglists I'm quite unhappy with
a bunch of user-visible changes that Hans sneaked in and make reiser4
incompatible with other filesystems
if we leave you in the dust, run faster.... not my problem....
Post by Christoph Hellwig
Given these problems I request that these interfaces are removed from
reiser4 for the kernel merge, and if added later at the proper VFS level
after discussion on linux-kernel and linux-fsdevel, like we did for
xattrs.
If you can't help fight WinFS, then get out of the way. Namesys is on
the march. Read www.namesys.com/whitepaper.html.

Or, be smart, recognize that reiser4 is faster and more flexible than
your storage layers because we are older and wiser and worked harder at
it, join the team, and start contributing plugins that tap into the
higher performance it offers.

Mi