[vox-tech] Question: mod_dav.1.0.3 + apache.1.3.26 and CR/LF issues w/ MacOS+MSWin

ME vox-tech@lists.lugod.org
Wed, 3 Jul 2002 16:46:47 -0700 (PDT)


> > On Wed, 3 Jul 2002, ME wrote:
> > WebDAV clients (Goliath, MS Windows Web Folders, cadaver (for Linux)) do
> > not appear to do any translation like the web clients (Nestscape, MSIE,
> > Lynx) have done and still do.

On Wed, 3 Jul 2002, Jeff Newmiller wrote:
> Netscape does NOT do any translation.  That was the point of my anecdote
> quoted below.

OK, you are correct on this, and I was incorrect: Netscape does not
translate the document by default, but can be made to. Netscape does
interpret the 3 types of line termination strings *all* as being line
breaks when it displays the text content in the browser. It also allows
the user to choose if they would like to tranlate the displayed files line
termination when they choose the "Save As" option and choose "text"
instead of "Source".

> IE _does_ do translation, and while it solves problems for users, it does
> so by masking the screwups of the server administrators.

Easy enough to bring the theory to a test and demo...

A text file with ^M but no ^J separating the words:
http://mike.passwall.com/test1.txt
Use in netscape you get:
this
is
a
test

A text file with ^J but no ^M separating words:
http://mike.passwall.com/test1.txt
Use in netsacape you get:
this
is
a
test

A text file with ^M^J separating words:
http://mike.passwall.com/test1.txt
Use in netsacape you get:
this
is
a
test

Get the same file with WebDAV based clients (Cadaver, Goliath, "MS Web
Folders") and the files appear exactly as they do on the server. (With the
included ^J or ^M or ^M^J.)

Save each file from netscape, and you should find test1 still has the ^M
separating the words, test2 has just the ^J separating the words, and
test3 has the ^M^J, it now only has the ^J. Also (of course) the file
lengths on the client for the DL files from Netscape or Dav are the
same.

The displayed content in Netscape is exactly the same for each of
them. The display includes proper line breaks in a per-OS way, such that
all of the OS display the files the same for the same browser.

However, the saved file content is not the same for each of the three 
files.

> A properly configured web server running on a *nix environment, when
> confronted with a "text" file on disk that has CRLFs, is supposed to
> transmit CRCRLFs, because it is supposed to translate the native newlines
> (LF) to the on-the-wire newlines (CRLF).

Try the above 3 files to see if you get this. File "test2.txt" uses CRLF
but does not pass through this way.

> > The RFC for HTTP/1.1 says the header is supposed to use CR/LF for HTTP,
> > but the body conversion to line breaks is left tot he client. With the
> > HTTP request, the "body" becomes the actual file (for the most part) but
> > not the header.
> 
> The client is allowed to _expect_ that "text" files transmitted to it have
> CRLFs as newline separators.  Any other sequence (LF or CR) _may_ be
> misinterpreted by the client.  The fact that most clients can handle
> various alternatives is sugar coating that saves lazy administrator's
> butts.
> 
> "Text" files have type "text/*" in the ContentType header.  If the server
> is serving up a file (in response to a GET) that it has determined to be
> some variant of "text" file, it is expected to make sure the line
> separators are CRLF at a minimum, and the client is expected to be able
> to handle that.
> 
> Similarly, if the client is serving up a "text" file, it is responsible to
> tell the server that in a ContentType header, and to normalize the line
> separators.  Likewise, the server is supposed to recognize the ContentType
> header and denormalize the line separators (CRLF to LF) as appropriate.

No translation is done for files pushed with WebDAV. No translation is
done for files that are pulled with WebDAV. The content of the files
remains the same with the client included CR, LF, or CRLF line
termination.

If a Mac user formats the file to make it look nice on their mac with
SimpleText, the next PC user user to edit that file from the server with
Notepad will be in hell.

> > Ideally, the clients would need to perform the line break translations for
> > files of type "text" that are dowloaded from a Dav enabled web server.
> 
> Yes, they should recognize ContentType as appropriate for their local
> system.

And they (web browsers) do so for display and Netscape does no translation
for the saved files. Neither do any of the webDAV clients.

> > Unfortunately, this does not happen in the Dav clients I have tested.
> > 
> > I have posted this as an issue to the "Microsoft news groups" complaining
> > about "MS Web Folders" not doing this translation. I have yet to post this
> > to the Cadaver or Goliath developers list, but will get around to it.
> 
> I am puzzled... this is a non-issue for that operating
> environment.  On-the-wire is no different from their on-the-disk format.
> The Mac and Unix clients would be the ones to be concerned about.

Oh, after joining the Dav workgroup, I have an answer from two members...
(Waiting till end before I give it...)

> > For now, a solution exists on my server to enable content modification to
> > server text files to DAV users and based on a matching search of their
> > client name/version, perform a translation of the text file
> > "on-the-fly" to use their native OS line termination char sequence. It is
> > not a "good solution" as all, but it does work to fill in this feature
> > missing in the clients.
> 
> I am boggled.  Are you sure it isn't a client-side misconfiguration?  The
> client software has to be able to tell the difference between a text file
> and a binary file somehow.  On a Mac, this is easy.  On *nix, may have to
> depend on file extensions, but you don't seem to be.

This is not a client side misconfiguration for the text conversion to
allow the text to work with Notepad when the file being DL was created by
a Mac, or allow SimpleText to read it well when the file was from
notepad. (Simpletext includes little "boxes" for the ^J to show it is an 
"unprinatable character" that is being "printed".)

Of course, Nestcape for Linux (Did not check others) allows you to "Save
As" a document you see in the web browser. If that document is save as
"Source" then what you got from the server is what is saved. If you save
as "Text" then the information is saved the translation to use the
client OS's line termination sequence except when the file uses ^M^J, then
it is saved with the same ^M^J.

> > It seems the "best solution" is to see if this can be added to Dav.
> 
> It must be in both the server and the clients, and the software at both
> ends has to be configured to distinguish text and binary files.... but
> MIME-type magic is pretty widespread... I thought the bad old days of ftp
> servers that had to be told the transfer type for each session were behind
> us.

If it is in Dav to translate, then any client can translate the text file
to its local formating for line termination.

Wait till you see the answer from the WebDAV group... :-/

> > OK. Fine. I am subscribing to a ietf/w3 working group on WebDAV. If the
> > RFC for Dav can be modified to be explicit on content modification for
> > text files over http, perhaps the clients will add this feature. :-)

One memebers response seconded by another:

"The problem you describe is not particularly a WebDAV problem. IMAP and
FTP are subject to the same problems.  MIME also does not attempt to fix
the problem."

(Problem being the conversion of CR/LF, LF, CR to the local OS linebreak
string.)

"...the real offenders are not WebDAV clients.  It's applications like
NotePad that can't display reasonable file formats that are problematic."

So, they know about this as a complaint, but do not see it as their issue.
They see this as a client editor problem.

Do I agree? I am not sure yet.

It creates less headaches for the server admin if the burden is shifted to
the user to use an editor that understand different conventions. It
creates more headaches for the desktop support people. Which is worse?

Another's response included:
"If you try the try again with Mac OS X's TextEdit application, you'll see
that it handles CR, LF and CR/LF correctly."

And this brings me back to the original post where I stated "BBEdit" and
the editor from MetroWerks Code Warrior can both handle file as text
editor and properly display the content reguardless of the local OS's
ideas on the matter with text files they create. I of course stated in the
original post that these were undesireable.

So, unless I choose to look to help add a new feature to Goliath (checkbox
to enable a text translation feature), Cadaver (a flag or arguement on
startup) and beg MS to allow for client side alteration of files or
Notepad, and/or ask Apple to alter SimpleText my two solutions include:

1) Tell my users they can use "BBEdit" an editor like the one from
CodeWarrior that understand the different conventions. (I know wordpad
will deal with this for display, but when the file is saved, there is risk
for contaimination of text data with 8 bit binary data and sequences to
enable/disable bold etc.)

or

2) Use the crazy-non-standard-web-server-hack that I made to automagically
serve them a text file with line termination strings appropriate to their
DAV Client's OS.

The same problem as before; should I push this problem to the clients, or
accept the problem and own it?

-ME

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS/CM$/IT$/LS$/S/O$ !d--(++) !s !a+++(-----) C++$(++++) U++++$(+$) P+$>+++ 
L+++$(++) E W+++$(+) N+ o K w+$>++>+++ O-@ M+$ V-$>- !PS !PE Y+ !PGP
t@-(++) 5+@ X@ R- tv- b++ DI+++ D+ G--@ e+>++>++++ h(++)>+ r*>? z?
------END GEEK CODE BLOCK------
decode: http://www.ebb.org/ungeek/ about: http://www.geekcode.com/geek.html