[vox-tech] SQL help - unions

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Jul 20 18:20:06 PDT 2004


On Tue, 20 Jul 2004, Peter Jay Salzman wrote:

> Total newbie here.  I know less about SQL than I do about xkb.   ;-)
> 
> I have two tables, with the same structure.  Call them [Fall 2003] and
> [Spring 2004].  They each hold data about enrolled students.  One of the
> fields of these tables is SSN (social security number).

The existence of separate tables with identical structures and disjoint
data in your schema is a red flag to me.

I highly recommend putting all the data into one table with a "Season"
column to distinguish the rows... it simplifies these kinds of queries and
makes the data much more useful for mining.

> I can get an unduplicated headcount for the academic year with:
> 
>    SELECT SSN from [Fall 2003]
>    UNION
>    SELECT SSN from [Spring 2004];
> 
> There's also a field called "race".  If the race field is 0, the student
> is African American.
> 
> How would I modify the above SQL statement to list the unduplicated
> headcount of all African Americans?  I would like to do something like:
> 
>    SELECT SSN from [Fall 2003]
>    UNION
>    SELECT SSN from [Spring 2004]
>    and race == 1;
> 
> but of course that doesn't work.  :)

As the other respondents suggested, different SQL engines have different
levels of support for the language.

    SELECT SSN from [Fall 2003] WHERE race == 1
    UNION
    SELECT SSN from [Spring 2004] WHERE race == 1

gets you a little bit closer, but there are a couple of problems with 
this... it isn't really obtaining a headcount, and unions don't
eliminate duplicates.

Imagine taking this step by step in a sequential batch, something like:

    CREATE TEMPORARY TABLE ssntbl1 (
        ssn char(9)
    );

    INSERT INTO ssntbl1 (ssn)
    SELECT SSN FROM [Fall 2003] WHERE race == 1;

    INSERT INTO ssntbl1 (ssn)
    SELECT SSN FROM [Spring 2004] WHERE race == 1;

    SELECT COUNT( DISTINCT ssn ) FROM ssntbl1;

or, in more efficient fashion:

    CREATE TEMPORARY TABLE ssntbl2 (
        season varchar(20)
      , headcount number
    );

    INSERT INTO ssntbl2 ( season, headcount )
    SELECT 'Fall 2003', COUNT(SSN) FROM [Fall 2003]
    WHERE race == 1;

    INSERT INTO ssntbl2 ( season, headcount )
    SELECT 'Spring 2004', COUNT(SSN) FROM [Spring 2004]
    WHERE race == 1;

    SELECT SUM(headcount) FROM ssntbl2;

The UNION clause in the first possible solution above is generally
implemented under the hood much like the ssntbl1 solution... that is, if a
student participates in both terms, their SSN will show up twice in the
output of the UNION.

If you had the data all in one table, the select statement for the
headcount per season for multiple seasons...

    SELECT season, COUNT( DISTINCT SSN )
    FROM UnifiedEnrollment
    WHERE season IN ( 'Fall 2003', 'Spring 2004' ) AND race=1
    GROUP BY season;

... would be about as easy as obtaining the headcount for multiple seasons
combined:

    SELECT COUNT( DISTINCT SSN )
    FROM UnifiedEnrollment
    WHERE season IN ( 'Fall 2003', 'Spring 2004' ) AND race=1;
    
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
---------------------------------------------------------------------------




More information about the vox-tech mailing list