[vox-tech] MySQL question: making stats out of orders

Tue Dec 1 18:04:31 PST 2009

On Tue, Dec 01, 2009 at 05:32:20PM -0800, Bill Kendrick wrote:
> 
> Ok, so I've got a few queries our website does that
> are used to determine which products are 'top sellers'
> in the past 90 days' worth of sales.
> 
> Right now, this involves joining the product table
> with the orders table, by way of an items-in-the-order table.
> 
> In other words, to figure out what sold the best in the
> last 90 days, it's something like:
> 
>   SELECT * FROM products
>   JOIN orderitem ON products.id = orderitem.productid
>   JOIN orders ON orders.id = orderitem.orderid
>   WHERE orders.timestamp >= { 90 days ago }
> 
> As time goes on, our records or orders, order items, and
> actual products all keep growing.  That means this task
> gets more and more intensive.

Is this task getting more intensive? If is (and the number of
transactions per day hasn't grown significantly), then you probably
don't have indices set up correctly.

I think that having an an index on orders.timestamp,
and an index on orderitem.orderid (in addition to the primary keys
products.id and orders.id) should optimize this query, but it always
pays to determine what's going on with EXPLAIN SELECT.

If that still doesn't speed things up enough, maybe consider using a
table with the MERGE storage engine to move old entries from the
orders table and the orderitem table into archives. Then only query
the table full of recent order data for queries, and only query the
other tables in the merge when you need to.

> Now, we've got some statistics tables set up for other
> purposes -- how many times a product was viewed,
> as well as when any particular logged-in-users last
> visited particular items, which allows us to do the
> "people who recently looked at X also recently looked at Y".
> 
> I'm thinking we could do something similar for sales,
> to simplify the "top sellers" queries.  They'd hit
> a single stats table (which could, on a daily basis,
> be cleared of any >90-days-ago entries), rather than
> having to JOIN two large tables that go back to day 1.
> 
> 
> My current problem -- no doubt due to lack of brain power
> at the end of the day -- is how to take the CURRENT data
> from the orders and orderitem tables so that I can do an
> initial population of a new stats table.
> 
> Our "views" stats table, for example, is something
> along the lines of:
> 
>   productid, date, views
> 
> where those first two columns act as a key.  (Date is at the
> 'day' level of granularity.)
> 
> 
> And we count a view like so:
> 
>   INSERT INTO views (productid, date, views) VALUES (###, NOW(), 1)
>   ON DUPLICATE KEY UPDATE views=views+1
> 
> In other words, the first time a product is viewed on any given day,
> we insert a new row, otherwise we update its existing row.
> 
> 
> Anyone feel like flexing their MySQL muscles and provide some
> ideas on how to populate a similar table (tracking sales) based
> on the orders/orderitem table combo explained above?

I'm not really clear on what you're looking for here. We need a better
idea of what stats you're running to know how to structure the table.

It would seem pretty simple to create the analagous table for orders:

    orderstats consisting of productid, date, orders

and every time you complete an order, you run the query:

    insert into orderstats(productid, date, orders) VALUES (###, NOW(),
    QTY) on duplicate key update orders=orders+QTY

-- 
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/