A new book from a major technical publisher …
A new book from a major technical publisher …
Four years ago I blogged about the fact that medal counts are often reported on a per-country basis without regard for the population of any country. I’ve always thought that approach to be silly. The per-capita medal count is much more interesting to me.
So this morning I used Bing to get the latest count by country of gold medals at the ongoing Rio Olympics. I included only those nations that have won two or more gold medals.
I then visited Wikipedia to get recent population counts for those nations.
I used all this data to determine the per-capita win per nation and ranked them. The results are below.
|Ranking||Country||Per Capita Ratio
|Gold Medal Count||Population||Date Population Reported|
You could argue that some country populations and/or medal counts are too small to prove anything other than an interesting anomaly. Or not. But there’s no question that Hungary and the Netherlands are very impressive, each with six gold medals for an overall per-capita rate that’s dramatically higher than, for example, Great Britain, the United States, and especially China, who – in spite of their very high medal count, is actually very far behind the other nations on a per capita basis.
These are the results presented graphically, which I created using Microsoft Excel, captured with the built-in “snipping” tool in Microsoft Windows: :
The Pareto Line seems to curve most significantly above Hungary and the Netherlands, which have the most striking performance in terms of size of population and count of gold medals. Also – note how far back China ranks. They might have a lot of medals, but they also have the largest population in the world.
To me, this sort of analysis is much more interesting than the simple medal count that doesn’t account for population.
Here’s an easy way to review the system log of an EC2 Instance using Amazon Web Service (AWS):
See below for what your screen should look like.
The result will be a pop-up window displaying the system log:
I just migrated this Skere9 blog from a Drupal implementation at SiteGround to a WordPress instance at Amazon Web Services. I’ve been a customer of AWS for three years and finally decided to move this blog to the same platform where I’ve been doing other work. It’s cost-effective and AWS provides significantly greater control over numerous aspects of the deployment.
Note: I have one author – myself – and I did not migrate any comments with this move. I will add those later.
The Drupal version is 7.4.1. The WordPress version is 4.5.3.
To migrate the blog posts, I took the following steps.
INSERT INTO wp_posts ( id , post_author , post_date , post_date_gmt , post_content , post_title , post_status , comment_status , ping_status , post_name , post_modified , post_modified_gmt , post_parent , guid , menu_order , post_type ) SELECT a.nid+100 , 1 , FROM_UNIXTIME(a.created) , FROM_UNIXTIME(a.created) , b.body_value , a.title , 'publish' , 'open' , 'open' , CONCAT('dr-', a.nid+100) , FROM_UNIXTIME(a.changed) , FROM_UNIXTIME(a.changed) , 0 , CONCAT('http://skere9.com/?p=', a.nid+100) , 0 , 'post' FROM node a LEFT JOIN (field_data_body b) ON (a.nid = b.entity_id) WHERE type = 'blog' ORDER BY a.nid
A few important notes about the above code:
It all worked like a charm.
Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates. I can’t present the actual data I’m working with here, but this is an illustrative example. Consider the data set below.
SQL> SELECT * 2 FROM shop_orders; ID CATEGORY ITEM_N ---------- ---------- -------------------- 1 Office Stapler 2 Fruit Banana 3 Bread French Bread 4 Steak Ribeye 5 Fish Bass 6 Crackers Saltines 7 Fish Swordfish 8 Fruit Pear 9 Bread Rolls 10 Crackers Chips 11 Steak Prime Rib 12 Bakery Birthday Cake 13 Fish Salmon 14 Fish Flounder 15 Cleaners Detergent 16 Bakery Donuts 17 Office Ink Cartridges 18 Steak Filet Mignon 19 Vegetable Broccoli 20 Steak Flank 21 Fruit Raspberry 22 Sundries Shampoo 23 Steak Salisbury 24 Vegetable Radish 25 Vegetable Mushroom 26 Pharmacy Band-aids 27 Pharmacy Aspirin 28 Sundries Toothbrush 29 Sundries Shampoo 30 Cereal Bran 31 Cereal Corn Flakes 32 Cleaners Soap 33 Cereal Oatmeal 33 rows selected.
One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?
To determine this, we can start with a GROUP BY, like this.
SQL> SELECT category, COUNT(*) CT 2 FROM shop_orders 3 GROUP BY category; CATEGORY CT ---------- ---------- Cereal 3 Fruit 3 Crackers 2 Vegetable 3 Office 2 Steak 5 Pharmacy 2 Fish 4 Bread 2 Cleaners 2 Sundries 3 Bakery 2 12 rows selected.
Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.
SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n 2 FROM 3 ( SELECT a.category, COUNT(a.item_n) CT 4 FROM shop_orders a 5 GROUP BY a.category 6 ) b 7 GROUP BY b.ct 8 ORDER BY b.ct DESC; CT_CATEGORY CT_ITEM_N ----------- ---------- 1 5 1 4 4 3 6 2
Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.
In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.
If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.
DROP TABLE shop_orders; CREATE TABLE shop_orders ( id NUMBER , category VARCHAR2(15) , item_n VARCHAR2(20) ); INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler'); INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana'); INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread'); INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye'); INSERT INTO shop_orders VALUES ( 5, 'Fish', 'Bass'); INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines'); INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish'); INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear'); INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls'); INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips'); INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib'); INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake'); INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon'); INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder'); INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent'); INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts'); INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges'); INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon'); INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli'); INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank'); INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry'); INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo'); INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury'); INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish'); INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom'); INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids'); INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin'); INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush'); INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo'); INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran'); INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes'); INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap'); INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal'); COMMIT;
There you go! Have fun!
An unusual but very appropriate song for Memorial Day. From Belinda Carlisle, on the Arsenio Hall show, circa 1990-something, as she debuted her hit Summer Rain.
I can hear the whistle
Military train …
On this Memorial Day, let us not forget those who sacrificed all for our country. Nor let us forget those they left behind.
Here’s a quick way to generate dummy data for performing tests that require lots of record.
SQL> DROP TABLE lots_o_data; Table dropped. SQL> CREATE TABLE lots_o_data 2 ( id NUMBER 3 , name VARCHAR2(5) 4 , category VARCHAR2(26) 5 ); Table created. SQL> INSERT INTO lots_o_data VALUES (1, 'Lotso', 'Wow this is a lot of data.'); 1 row created. SQL> INSERT INTO lots_o_data VALUES (2, 'More', 'Bunches and bunches.'); 1 row created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 4 rows created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 36 rows created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 1764 rows created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 3261636 rows created. SQL> COMMIT; Commit complete. SQL> SELECT COUNT(*) 2 FROM lots_o_data; COUNT(*) ---------- 3263442 SQL> SPOOL OFF
Note the following
If you need a lot of data for performance testing or capacity testing, this is an easy way to get however much you require.
I have been working on building a data lake environment in anticipation of a series of large data feeds I’m expecting soon and found this to be a useful approach to preparing the environment.
From the 2015 Country Music Association (CMA) Country Music Awards show, featuring Martina McBride and Michael W. Smith.
Merry Christmas everyone!
On September 25, 2013, the American racing team led by Larry Ellison retook the America’s Cup in spectacular fashion, after several days and dramatic events. Nobody could have predicted that the grand finish would end up happening in San Francisco Bay, within walking distance of Oracle Open World / JavaOne, and on the climactic “customer appreciation night” of that year’s conference. It was an amazing once-in-a-lifetime event.
KTVU’s Jana Katsuyama featured me in a soundbite speaking on behalf of the very proud attendees of Oracle Open World, see below.
At first, Jana asked me a series of questions about whether attendees were upset that Larry’s attention was divided between the conference and the race, and that he’d missed a keynote earlier in the day to go celebrate the victory.
I said – no way! The conference is great, everything is well organized and continuing as planned, and who could possibly predict such a stunning situation – good for Larry!
And Jana, to her credit, let the spirit of my comments convey in the edited version that broadcast that evening on the leading Bay area news channel KTVU.