Kernel Hacks & Weird Filenames
Guy Harris
guy at gorodish.Sun.COM
Fri May 6 10:02:38 AEST 1988
> HP-UX has the routine isprint (most likely all other Un*xes have it too).
One should hope so; HP certainly didn't invent it, the people at BTL Research
did.
> So it is not too hard to determine what a printable character is (HP-UX's
> implementation includes NLS as well).
Given that it includes NLS, there is no single answer to that question. The
answer depends on the character set you select.
This brings up another question: should the answer depend on the type of
terminal you're currently logged in on? I.e., if you're on a VT100, should the
upper half of ISO Latin #1 be excluded, while if you're on a VT220 it's
included?
Another question: what does "isprint" do about "wide" character sets such as
various Kanji character sets?
> As to the whole topic of what belongs in a valid filename, it seems to me
> that if you could truly have ANY character in a filename, then things would
> be ok, but that isn't the case. First of all, as others have pointed out,
> you have to exclude '\0' and '/'. In addition, most (all?) shells use some
> characters as metacharacters.
Most, not all. The major conventional UNIX command-line shells do; however,
you could have a "fill in the form" shell, or a "desktop metaphor" shell, that
doesn't.
> In short, I see no gain and many drawbacks to allowing arbitrary characters
> in filenames.
OK, what does "allowing" mean here? There *might* be some merit to disallowing
the creation of path names containing certain bytes (note, as per the prvious
mention of Kanji character sets, that a "character" is not necessarily a single
byte). Disallowing *all* pathnames containing these bytes would be wrong,
however, as it would prohibit you from referring to some of those files if your
session weren't configured to allow all characters in file names. (No, you
can't say "you're on a terminal that doesn't support 8-bit characters, you
wouldn't be able to refer to them anyway"; consider a user logged in on a 7-bit
terminal doing an "rm -rf" on a directory containing files with 8-bit
characters in their names - or just with blanks in their names, if you choose
to disallow them.)
And, once again, I bring up the question of character sets such as various
Kanji sets. If not all 16-bit combinations are valid Kanji, how can you
disallow "invalid" characters if each of the two bytes in such a character is
valid in some other character?
Sure, it sounds nice to say "make life easier for the user by preventing
hard-to-reference filenames from being used". It's not clear that it's really
that easy. Obviously, the kernel should not provide any policy here; I'm not
sure you can even provide a reasonable policy-free mechanism atop which the
desired policies can be implemented.
BTW, note that Draft 12 of POSIX says:
filename
Names consisting of 1 to {NAME_MAX} bytes may be used to name a
file. The characters composing the name may be selected from the
set of all characters excluding the slash character and those
containing the null byte (octal zero).
>From this, I infer that no POSIX-conformant system will prohibit me from using
^A or '\353' in a file name; there may well be application writers who, for
whatever reason (bad or good), decide to do so. Turning on filename
restrictions might conceivably break these applications; before you add such
restrictions, make sure either that this won't break any important applications
or that you can live with them not working.
More information about the Comp.unix.wizards
mailing list