Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates. I can’t present the actual data I’m working with here, but this is an illustrative example. Consider the data set below.
SQL> SELECT *
2 FROM shop_orders;
ID CATEGORY ITEM_N
---------- ---------- --------------------
1 Office Stapler
2 Fruit Banana
3 Bread French Bread
4 Steak Ribeye
5 Fish Bass
6 Crackers Saltines
7 Fish Swordfish
8 Fruit Pear
9 Bread Rolls
10 Crackers Chips
11 Steak Prime Rib
12 Bakery Birthday Cake
13 Fish Salmon
14 Fish Flounder
15 Cleaners Detergent
16 Bakery Donuts
17 Office Ink Cartridges
18 Steak Filet Mignon
19 Vegetable Broccoli
20 Steak Flank
21 Fruit Raspberry
22 Sundries Shampoo
23 Steak Salisbury
24 Vegetable Radish
25 Vegetable Mushroom
26 Pharmacy Band-aids
27 Pharmacy Aspirin
28 Sundries Toothbrush
29 Sundries Shampoo
30 Cereal Bran
31 Cereal Corn Flakes
32 Cleaners Soap
33 Cereal Oatmeal
33 rows selected.
One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?
To determine this, we can start with a GROUP BY, like this.
SQL> SELECT category, COUNT(*) CT
2 FROM shop_orders
3 GROUP BY category;
CATEGORY CT
---------- ----------
Cereal 3
Fruit 3
Crackers 2
Vegetable 3
Office 2
Steak 5
Pharmacy 2
Fish 4
Bread 2
Cleaners 2
Sundries 3
Bakery 2
12 rows selected.
Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.
SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n
2 FROM
3 ( SELECT a.category, COUNT(a.item_n) CT
4 FROM shop_orders a
5 GROUP BY a.category
6 ) b
7 GROUP BY b.ct
8 ORDER BY b.ct DESC;
CT_CATEGORY CT_ITEM_N
----------- ----------
1 5
1 4
4 3
6 2
Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.
In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.
If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.
DROP TABLE shop_orders;
CREATE TABLE shop_orders
( id NUMBER
, category VARCHAR2(15)
, item_n VARCHAR2(20)
);
INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler');
INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana');
INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread');
INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye');
INSERT INTO shop_orders VALUES ( 5, 'Fish', 'Bass');
INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines');
INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish');
INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear');
INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls');
INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips');
INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib');
INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake');
INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon');
INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder');
INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent');
INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts');
INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges');
INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon');
INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli');
INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank');
INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry');
INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury');
INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish');
INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom');
INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids');
INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin');
INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush');
INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran');
INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes');
INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap');
INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal');
COMMIT;
There you go! Have fun!