Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates. I can’t present the actual data I’m working with here, but this is an illustrative example. Consider the data set below.
SQL> SELECT * 2 FROM shop_orders; ID CATEGORY ITEM_N ---------- ---------- -------------------- 1 Office Stapler 2 Fruit Banana 3 Bread French Bread 4 Steak Ribeye 5 Fish Bass 6 Crackers Saltines 7 Fish Swordfish 8 Fruit Pear 9 Bread Rolls 10 Crackers Chips 11 Steak Prime Rib 12 Bakery Birthday Cake 13 Fish Salmon 14 Fish Flounder 15 Cleaners Detergent 16 Bakery Donuts 17 Office Ink Cartridges 18 Steak Filet Mignon 19 Vegetable Broccoli 20 Steak Flank 21 Fruit Raspberry 22 Sundries Shampoo 23 Steak Salisbury 24 Vegetable Radish 25 Vegetable Mushroom 26 Pharmacy Band-aids 27 Pharmacy Aspirin 28 Sundries Toothbrush 29 Sundries Shampoo 30 Cereal Bran 31 Cereal Corn Flakes 32 Cleaners Soap 33 Cereal Oatmeal 33 rows selected.
One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?
To determine this, we can start with a GROUP BY, like this.
SQL> SELECT category, COUNT(*) CT 2 FROM shop_orders 3 GROUP BY category; CATEGORY CT ---------- ---------- Cereal 3 Fruit 3 Crackers 2 Vegetable 3 Office 2 Steak 5 Pharmacy 2 Fish 4 Bread 2 Cleaners 2 Sundries 3 Bakery 2 12 rows selected.
Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.
SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n 2 FROM 3 ( SELECT a.category, COUNT(a.item_n) CT 4 FROM shop_orders a 5 GROUP BY a.category 6 ) b 7 GROUP BY b.ct 8 ORDER BY b.ct DESC; CT_CATEGORY CT_ITEM_N ----------- ---------- 1 5 1 4 4 3 6 2
Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.
In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.
If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.
DROP TABLE shop_orders; CREATE TABLE shop_orders ( id NUMBER , category VARCHAR2(15) , item_n VARCHAR2(20) ); INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler'); INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana'); INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread'); INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye'); INSERT INTO shop_orders VALUES ( 5, 'Fish', 'Bass'); INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines'); INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish'); INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear'); INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls'); INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips'); INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib'); INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake'); INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon'); INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder'); INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent'); INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts'); INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges'); INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon'); INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli'); INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank'); INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry'); INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo'); INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury'); INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish'); INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom'); INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids'); INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin'); INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush'); INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo'); INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran'); INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes'); INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap'); INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal'); COMMIT;
There you go! Have fun!