[vox-tech] collaborative data storage (of excel files)

Harold Lee harold at hotelling.net
Tue Jan 15 12:10:01 PST 2008


On Jan 15, 2008, at 10:33 AM, Dylan Beaudette wrote:

> On Tuesday 15 January 2008, Henry House wrote:
>> On 2008-01-15, wrote Dylan Beaudette:
>>> Hi,
>>>
>>> some of the people in my lab are interested in collaboratively  
>>> compiling
>>> a large quantity of environmental data- each user appending several
>>> hundred measurements of several variables every week.
>>>
>>> They are currently emailing around a spread sheet file and there  
>>> have
>>> been numerous data accidents. Now they are asking to put the file  
>>> onto a
>>> shared drive, so that they can access it remotely. This sounds  
>>> like a
>>> terrible idea to me- even worse than the previous attempt.
>>
>> Their idea is distilled evil! But you already knew that.
>
> Indeed!
>
>
>>> The data are essentially rows and cols of numbers that are added  
>>> to and
>>> edited weekly.
>>>
>>> At first I thought subversion might be helpful, but revision control
>>> doesn't work so well with binary data (excel files)... unless  
>>> there is
>>> something I don't know about. It would be hard to detect  
>>> conflicts, or to
>>> merge data. However, it would allow for timestamps and revision  
>>> numbers
>>> to provide some level of authority.
>>>
>>> Designing some kind of database-driven system seems like a logical
>>> choice, but I do not have the time to do this. Perhaps there is  
>>> already
>>> something out there.
>>>
>>> Does anyone have some insight into how to solve this data management
>>> nighmare?
>>
>> The right way to do this is to use a database. But, an easier
>> maybe-almost-as-good solution might be to use subversion and save the
>> data as CSV text files (excel can do this just fine). It is useful to
>> add comment lines (maybe you could have an internal convention about
>> this) that help subversion to figure out where to merge in changes.
>
> This is a good idea. Unfortunately we might be stuck with 4 or 5  
> mega CSV
> files which are constantly appended to, but SVN should be able to  
> deal with
> keeping things sane. The trick will be to get people to realize that  
> these
> are CSV files, and therefore no monkeying around with formulas, etc.
>

OpenOffice.org provides a database tool that would probably work for  
you. You can then use the database data in spreadsheet formulas. Since  
you have only one table (for now) it should be easy to design and  
create the database. FileMaker is better, but costs money.

Either way, you'll get admin-free database work on a shared file, but  
they're not geared towards merging different copies that people are  
working on off-line.

I advocate separating the data and code - some recent studies point  
out that bugs in Excel code are actually a widespread and expensive  
problem. As you pointed out, even just having a version-controlled CSV  
file and linking it in to a spread sheet should provide some protection.

The next step up, sorta, might be to set up a web server with a very  
simple CGI for appending records to the CSV and committing, accessed  
via a web form. From there you could proceed to using an low-admin  
database if you wanted.

-Harold

> I'll keep digging around for ideas.
>
> Dylan
>
> -- 
> Dylan Beaudette
> Soil Resource Laboratory
> http://casoilresource.lawr.ucdavis.edu/
> University of California at Davis
> 530.754.7341
> _______________________________________________
> vox-tech mailing list
> vox-tech at lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech


More information about the vox-tech mailing list