[vox-tech] Question: mod_dav.1.0.3 + apache.1.3.26 and CR/LF issues w/ MacOS+MSWin

Jeff Newmiller vox-tech@lists.lugod.org
Thu, 4 Jul 2002 00:53:45 -0700 (PDT)


On Wed, 3 Jul 2002, ME wrote:

> 
> On Wed, 3 Jul 2002, Jeff Newmiller wrote:
> [chop]
> > A CRLF file on a *nix server looks funky to
> > programs on the server.  Just because your current clients aren't doing
> > text processing there doesn't mean that someone, such as yourself, might
> > not want to do text processing there.
> 
> Sure, having the text files converted to the local OS expected format in
> both directions (client-server, server-client) would be ideal. I was
> looking at minimums for my present needs - shortsighted, yes, but trying
> to simplify the problem down to smaller manageable chunks.
> 
> > > Though i do like (from a support perspective) the idea of having this
> > > placed into the RFC for WebDAV clients, I can see other reasons for not
> > > including it.
> > > 
> > > It decouples OS specific limitations and restrictions from the protocol.
> > > It means less work for the maintainers of the RFC.
> > > It means less debate and arguement with the creators of the WebDAV
> > >    clients.
> > 
> > I can't tell what you are defending with these statements.  The last one
> > seems to imply that you are defending NOT including the standard text
> > format for on-the-wire transmission, but the other two imply that you are
> > defending the opposite (to me).
> 
> Ah... at this point, I am trying to come up with more reasons for *not*
> doing it. Once lists of complaints against not doing it are hammered out,
> problems with these reasons can be found. (Trying to look at things I
> might find as reasons if I were them.) The above are reasons I might cite
> if I were them and did not want to make the change.
> 
> For the first one, the aruement goes something like this:
> "We are not responsible for dealing with text file any different from
> binary image files. We do not care about the OS as that is not part of our
> scope. If we tie any OS or descriptions to how OS encode files, then we
> limit ourselves from future OS advanceent."

I have no desire to tell any OS how to store text files.

I want text files to be the same on the wire, so os-specific code at each
end of the transmission can take care of os-specific concerns without
having to be aware of how the the other end takes care of things.

The internet is based on two types of files: interoperable text, and
application-specific binary file formats.  Any software that generates
binary data risks never being understood by any other computer, so I am
very strongly against the idea of dropping the concept of a common
text format.  The support is there, the standards are there, and it
sounds to me like the WebDAV programmers are avoiding this issue that 
has a long history and known solutions.

> And the other two are political reasons or disinterest in making a change.

I don't understand how the RFC authors can regard the interchangability of
text as such a horrendous problem.  Just define it on the wire, and let
the client and server authors take care of the implementation using the
appropriate os-specific mechanisms for doing so.

> These can, of course, be countered - that is not the point. Targetting
> their complaints and eliminating them is useful for me to make sure this
> is something that does not have serious risk for headaches. (This being:
> making changes to clients.)

I am amazed that this discussion is taking place.

> > > The content modification hack is only for outbound files specific
> > > to users of webDAV clients based on extention - NOT for any other
> > > browsers. I could care less what the server content looks like. Of course,
> > > a bi-directional modification shifted to the clients would be good and
> > > would not find me complaining. :-)
> > 
> > That is exactly the trap I would prefer that you avoid... if you give in
> > and let all those different types of newlines be emitted from your server,
> > then you create an environment that will be incompatible when the authors
> > of the clients are convinced to start using a single text format on the
> > wire.  That is, your users are going to be pissed off at future client
> > updates because they "break" compatibility with your hack by doing things
> > the right way.
> > 
> > It would be good in general for text files to look like text files on your
> > server, so if you feel like putting in a hack to make that happen, I was
> > saying go for it.  You currently seem to think this is of no value, but
> > having different newlines in your text files on your server is going to be
> > a headache eventually.
> 
> You offer a very good point here for making it exist on both sides. I have
> no logical counter for it. The only illogical counters include me not
> worrying about it because I was lazy. However, I will try to include it
> and try to reverse the process for storage on the server and complete the
> hack.
> 
> The present hack only does parsing for clients with ext .html, and .txt. I
> should probably add .htm too. A more generic text/*   _seems_ like a good
> idea, but looking through /etc/mime.types I see many extention names that
> are unfamiliar to me. The impact of such a change has impact beyond my
> experience. A certain level of trepidation exists, perventing me from
> starting there.

Apache typically uses file extensions to decide what MIME types the files
are, but I don't think it is the only mechanism available.

When I wrote "text/*", I meant "text/" followed by whatever... "html",
"plain", "xml", etc.  Any MIME type beginning with "text/" should be
handled this way.  Any other MIME type pretty much has to be treated as
binary by the server.

> OK, I am convinced to look at mods based on puts from Dav Clients. This
> will be a bit more difficult, but should be possible.
> 
> > > Nah. I was only looking to the two above for the immediate issue. It is
> > > much easier to tell my users to use specific editors than to find all text
> > > editors out there. Though this is a possible option, I removed it from the
> > > list due to the work involved.
> > 
> > I am recommending that you make your server behave the right way, and tell
> > your users how to deal with the problem.  Then when the clients start
> > working correctly, your users may discover that "the authors of the client
> > programs fixed things to work right now" and they will then be able to use
> > whatever text editors they want to.
> 
> Luckily, I designed the hack to be granular in enforcement and it uses a
> per client configuration and the number of users for this experimental
> service is less than 20. Also, if they wish to complain too much they can
> go find some other server. They are using my system at my whim. They are
> subject to my testing and experimentation. They get free web space with no
> ads, and I get a live userbase on which to test things for the real-world
> before I make co-workers accept something new. When a WebDAV client gets
> "fixed" to conform, I can remove the hack for that cleint and tell people
> to upgrade to that client if they want it nice.
> 
> > M$ users should need not training during any of this, for reasons
> > described above.
> 
> I am migrating users away from an asun-netatalk (via chooser with TCP/IP,
> ASIP) and samba with Windows users using special batch files to connect to
> the server. (Rather tough to support, and secure,) Users are being
> migrated to this new DAV system.
> 
> > M$ is home free.
> 
> Not entirely. When a *.txt file is created with notepad.exe in Windows
> (only tested 2000 but I expect it to be the same for all others) and then
> stored to the server over MS WebDAV client (MS Internet Folders), the ^M^J
> convention is found on the server stored files as the file format.
> 
> If the file was properly stored on the server in "local OS tine
> termination format" then I should only see ^J in the actual content of the
> server stored files.

This is a server issue, not a client issue.  Once the CRLF appears on the
wire by whatever mechanism, the other end is obligated to deal with it
because a text file should ALWAYS have a CRLF on the wire.

> Of course emacs has the ability to recognize a "DOS" file, and the vi
> clone I am using can do the same. "jove" shows me the truth of the file
> and its actual line termination.
> 
> Attempts to copy *.txt files created on the server (from scratch in emacs,
> or vi or jov) lead to files when copied through WebDAV "MS Internet
> Folders" to the windows desktop lead to files (when opened with
> notepad) that have small "block" chars representing the linux ^J and no
> line breaks for the content shown in notepad.

Server issue... server should have produced CRLFs on output.

> This means MS Windows will have to be included too.

Nope.

> When I use the hack
> for the client side, the copied file has the client OS's line termination
> and can be read with notepad, jove or simpletext.

And when the Mac and Unix clients get fixed, you will end up with
unpredictable results on the client side.  If the clients are robust, they
may give good results anyway... but since you are feeding them
non-wire-standard data they are not obligated to work right.

> > Apple may have to fix their software, but it will be extremely easy for
> > them.
> > 
> > > Over the long term, i will agree this is a better track, but does not help
> > > me right now. :-(
> > 
> > Well, your Mac and Linux users will be inconvenienced, but you have
> > workarounds in hand.
> 
> And I can easily remove filtering for their clients when I know a fix is
> available for their client.

The funny thing is... if you have trained them to limit themselves to
multi-newline-type text editors, they won't notice the difference.  The
will only accidentally (or by notification) realize that they are no
longer limited to such a narrow range of tools.

> > > If the 3 of the 4 can be changed, I expect the 4th can be changed. If 4 of
> > > these 4 are changed, then it can become a defacto standard, and may get
> > > added to the RFC as a "should" or "may" even if it is not a "must".
> > 
> > Now it is my turn to be confused. What are you enumerating?
> 
> I was looking at inertia in code development. If you can get 75% of the
> market to swtch, you have more leverage to convince the last 25% to
> switch. (1)cadaver, (2)Goliath, (3)iDisk, (4)MS Web Folders. Getting two
> to switch should be possible. If either Apple or MS switch it might be
> easier to convince the last to switch.

Yes, 1 through 3 have to change.  4 does not.

> With 100% of WebDAV client using it, it becomes a defacto standard. Then a
> mod to the RFC might be more acceptable such that it may be included as a
> "suggested" item via "may" or "should" but not appear as a "must". A
> "MUST" in the RFC would be very nice. =)

Read Step 2 "Conversion to canonical form" in http://RFC.net/rfc2049.html.
This is as close to "MUST" as I have found so far.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
---------------------------------------------------------------------------