[vox-tech] CSV with rogue EOLs

Samuel N. Merritt vox-tech@lists.lugod.org
Thu, 16 Oct 2003 19:01:26 -0700


--x+6KMIRAuhnl3hBn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 16, 2003 at 06:24:32PM -0700, Bill Kendrick wrote:
>=20
> Has anyone got a Perl or sed script handy that can take a CSV
> (comma-separated values) text file like this:
>=20
>   "1234","Hello","ABCD"
>   "1235","Hello
>   there","XYZ"
>   "1236","Goodbye","LLLL"
>=20
> and make it look like this:
>=20
>   "1234","Hello","ABCD"
>   "1235","Hello there","XYZ"
>   "1236","Goodbye","LLLL"
>=20
> e.g., wherever there are EOLs _within_ fields (between quotes), have it
> replace those with something (in my example above, just a space)

Let's see if this works. (Warning: use at own risk!)

while(<>)
{
        $line =3D $_;
        chomp $line;
        while (scalar($line =3D~ s/([^\\]")/$1/g ) % 2)
        {
                $line .=3D <STDIN>;
                chomp $line;
        }
        print "$line\n";
}

I tested this on the example you provided, and it seems to work.=20
=20
> I'm unfortunately dealing with Excel, and even it is too stupid to rememb=
er
> when its within a field, so you end up with a spreadsheet like this:
>=20
>   1234  | Hello   | ABCD
>   1235  | Hello   | [blank]
>   there | XYZ     | [blank]
>   1236  | Goodbye | LLLL
>=20
> I've dealt with this issue before, but it was years ago.  And I used C. ;=
^)
>=20
> Thx!
>=20
> -bill!
>=20
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech

--=20
Samuel Merritt
OpenPGP key is at http://meat.andcheese.org/~spam/spam_at_andcheese_dot_org=
.asc
Information about PGP can be found at http://www.mindspring.com/~aegreene/p=
gp/

--x+6KMIRAuhnl3hBn
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/j012W3tuPJ1t7wURApMAAJ4n7h91bFCRE1a4g/yQHax7osUcMgCfVVVR
BrkbmyO7hKhUlnaE+SmyF9w=
=ErW7
-----END PGP SIGNATURE-----

--x+6KMIRAuhnl3hBn--