Tech – Page 2

Rio Mid-Games: Gold Medals Per Capita So Far

Four years ago I blogged about the fact that medal counts are often reported on a per-country basis without regard for the population of any country. I’ve always thought that approach to be silly. The per-capita medal count is much more interesting to me.

So this morning I used Bing to get the latest count by country of gold medals at the ongoing Rio Olympics. I included only those nations that have won two or more gold medals.

I then visited Wikipedia to get recent population counts for those nations.

I used all this data to determine the per-capita win per nation and ranked them. The results are below.

Ranking	Country	Per Capita Ratio (Descending Order)	Gold Medal Count	Population	Date Population Reported
1	Jamaica	7.34418E-07	2	2,723,246	31-Dec-14
2	New Zealand	6.37423E-07	3	4,706,450	16-Aug-16
3	Hungary	6.10811E-07	6	9,823,000	1-Jan-16
4	Croatia	4.77251E-07	2	4,190,669	31-Dec-15
5	Netherlands	3.52389E-07	6	17,026,640	16-Aug-16
6	Australia	2.48343E-07	6	24,160,100	16-Aug-16
7	United Kingdom	2.45738E-07	16	65,110,000	30-Jun-15
8	Switzerland	2.39762E-07	2	8,341,600	31-Mar-16
9	Greece	1.84196E-07	2	10,858,018	1-Jan-15
10	Cuba	1.77952E-07	2	11,239,004	31-Dec-15
11	Belgium	1.76637E-07	2	11,322,674	1-Jun-16
12	Italy	1.31871E-07	8	60,665,551	1-Jan-16
13	South Korea	1.18107E-07	6	50,801,405	1-Jul-16
14	Kazakhstan	1.12656E-07	2	17,753,200	1-May-16
15	Germany	1.10064E-07	9	81,770,900	30-Sep-15
16	France	1.04891E-07	7	66,736,000	1-Jul-16
17	Spain	8.61356E-08	4	46,438,422	1-Jan-16
18	United States	8.01855E-08	26	324,248,000	16-Aug-16
19	North Korea	7.91108E-08	2	25,281,000	1-Jul-16
20	Russia	7.50345E-08	11	146,599,183	1-May-16
21	Uzbekistan	6.33406E-08	2	31,575,300	1-Jan-16
22	Canada	5.53166E-08	2	36,155,487	1-Apr-16
23	Japan	5.51225E-08	7	126,990,000	1-Jul-16
24	Poland	5.20329E-08	2	38,437,239	31-Dec-15
25	Kenya	4.52934E-08	2	44,156,577	1-Jul-15
26	Colombia	4.09887E-08	2	48,793,900	16-Aug-16
27	Thailand	3.04279E-08	2	65,729,098	31-Dec-15
28	Iran	2.51689E-08	2	79,463,100	16-Aug-16
29	China	1.08836E-08	15	1,378,220,000	16-Aug-16
30	Brazil	9.68457E-09	2	206,514,000	16-Aug-16

You could argue that some country populations and/or medal counts are too small to prove anything other than an interesting anomaly. Or not. But there’s no question that Hungary and the Netherlands are very impressive, each with six gold medals for an overall per-capita rate that’s dramatically higher than, for example, Great Britain, the United States, and especially China, who – in spite of their very high medal count, is actually very far behind the other nations on a per capita basis.

These are the results presented graphically, which I created using Microsoft Excel, captured with the built-in “snipping” tool in Microsoft Windows: :

The Pareto Line seems to curve most significantly above Hungary and the Netherlands, which have the most striking performance in terms of size of population and count of gold medals. Also – note how far back China ranks. They might have a lot of medals, but they also have the largest population in the world.

To me, this sort of analysis is much more interesting than the simple medal count that doesn’t account for population.

Sources:

Gold Medal Count as of 8/16/2016: http://www.bing.com/search?pc=COSP&ptag=D073116-A855B9C56E1&form=CONBDF&conlogo=CT3335450&q=gold+medal+count+by+country
Country Population Data: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population

Amazon Web Services (AWS) – System Log

Here’s an easy way to review the system log of an EC2 Instance using Amazon Web Service (AWS):

Navigate to the Management Console (the starting point of everything in AWS)
Click “EC2”
Click “Running Instances”
Choose the instance you’re interested in by clicking the empty square to its right and turning it solid blue
Click Actions -> Instance Settings -> Get System Log

See below for what your screen should look like.

The result will be a pop-up window displaying the system log:

That’s it!

MySQL Migration: from Drupal to WordPress

I just migrated this Skere9 blog from a Drupal implementation at SiteGround to a WordPress instance at Amazon Web Services. I’ve been a customer of AWS for three years and finally decided to move this blog to the same platform where I’ve been doing other work. It’s cost-effective and AWS provides significantly greater control over numerous aspects of the deployment.

Note: I have one author – myself – and I did not migrate any comments with this move. I will add those later.

The Drupal version is 7.4.1. The WordPress version is 4.5.3.

To migrate the blog posts, I took the following steps.

I temporarily installed phpMyAdmin at the Drupal site to export the “node” and “field_data_body” tables from Drupal. (Note: this could have also been done using MySQL directly.)
I installed phpMyAdmin at the WordPress site to import the Drupal tables. (Note: this could have also been done using MySQL directly.) I edited the phpMyAdmin config file to restrict access to phpMyAdmin to my own IP address.
I executed the script below via phpMyAdmin at the WordPress site to pull data from the imported Drupal tables and insert that data into the WordPress database.

INSERT INTO wp_posts
(       id
      , post_author
      , post_date
      , post_date_gmt
      , post_content
      , post_title
      , post_status
      , comment_status
      , ping_status
      , post_name
      , post_modified
      , post_modified_gmt
      , post_parent
      , guid
      , menu_order
      , post_type
)
SELECT
        a.nid+100
      , 1
      , FROM_UNIXTIME(a.created)
      , FROM_UNIXTIME(a.created)
      , b.body_value
      , a.title
      , 'publish'
      , 'open'
      , 'open'
      , CONCAT('dr-', a.nid+100)
      , FROM_UNIXTIME(a.changed)
      , FROM_UNIXTIME(a.changed)
      , 0
      , CONCAT('http://skere9.com/?p=', a.nid+100)
      , 0
      , 'post'
FROM     node a LEFT JOIN (field_data_body b)
            ON (a.nid = b.entity_id)
WHERE    type = 'blog'
ORDER BY a.nid

A few important notes about the above code:

I had fewer than 100 existing posts in the WordPress instance- it was a brand new installation. This is why I only added 100 to the ID, to create unique identifiers.
I transformed the data formats from UNIX to plain text, as required by the source and target tables.
I chose to enter each value into the WordPress “postname” field with a “dr-” prefix – for “Drupal”. But of course that could have been anything, or nothing at all – no prefix is required here.
The “_gmt” fields in the target database are being treated by my WordPress implementation as the same as the local time. Presumably GMT refers to Greenwich Mean Time, comparable to UTC. I’m in the USA eastern time zone (New York City zone – I’m in Washington, DC), so presumably the offset should be 5 hours but the existing WordPress implementation I have isn’t treating the times differently. Therefore I didn’t treat them differently either – I made the “created” timestamp the same for the local and GMT entry, and did the same for the “changed” timestamp.

It all worked like a charm.

GROUP BY on a GROUP BY

Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates. I can’t present the actual data I’m working with here, but this is an illustrative example. Consider the data set below.

SQL> SELECT   *
  2  FROM     shop_orders;

        ID CATEGORY   ITEM_N
---------- ---------- --------------------
         1 Office     Stapler
         2 Fruit      Banana
         3 Bread      French Bread
         4 Steak      Ribeye
         5 Fish       Bass
         6 Crackers   Saltines
         7 Fish       Swordfish
         8 Fruit      Pear
         9 Bread      Rolls
        10 Crackers   Chips
        11 Steak      Prime Rib
        12 Bakery     Birthday Cake
        13 Fish       Salmon
        14 Fish       Flounder
        15 Cleaners   Detergent
        16 Bakery     Donuts
        17 Office     Ink Cartridges
        18 Steak      Filet Mignon
        19 Vegetable  Broccoli
        20 Steak      Flank
        21 Fruit      Raspberry
        22 Sundries   Shampoo
        23 Steak      Salisbury
        24 Vegetable  Radish
        25 Vegetable  Mushroom
        26 Pharmacy   Band-aids
        27 Pharmacy   Aspirin
        28 Sundries   Toothbrush
        29 Sundries   Shampoo
        30 Cereal     Bran
        31 Cereal     Corn Flakes
        32 Cleaners   Soap
        33 Cereal     Oatmeal

33 rows selected.

One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?

To determine this, we can start with a GROUP BY, like this.

SQL> SELECT   category, COUNT(*) CT
  2  FROM     shop_orders
  3  GROUP BY category;

CATEGORY       CT
---------- ----------
Cereal              3
Fruit               3
Crackers            2
Vegetable           3
Office              2
Steak               5
Pharmacy            2
Fish                4
Bread               2
Cleaners            2
Sundries            3
Bakery              2

12 rows selected.

Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.

SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n
  2  FROM
  3    ( SELECT   a.category, COUNT(a.item_n) CT
  4  	 FROM	  shop_orders a
  5  	 GROUP BY a.category
  6    ) b
  7  GROUP BY b.ct
  8  ORDER BY b.ct DESC;

CT_CATEGORY  CT_ITEM_N
----------- ----------
          1          5
          1          4
          4          3
          6          2

Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.

In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.

If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.

DROP   TABLE shop_orders;
CREATE TABLE shop_orders
(   id       NUMBER
  , category VARCHAR2(15)
  , item_n   VARCHAR2(20)
);
INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler');
INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana');
INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread');
INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye');
INSERT INTO shop_orders VALUES ( 5, 'Fish',  'Bass');
INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines');
INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish');
INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear');
INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls');
INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips');
INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib');
INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake');
INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon');
INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder');
INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent');
INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts');
INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges');
INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon');
INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli');
INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank');
INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry');
INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury');
INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish');
INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom');
INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids');
INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin');
INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush');
INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran');
INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes');
INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap');
INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal');

COMMIT;

There you go! Have fun!

Generating Test Data

Here’s a quick way to generate dummy data for performing tests that require lots of records.

SQL> DROP   TABLE lots_o_data;

Table dropped.

SQL> CREATE TABLE lots_o_data
  2  (	 id	  NUMBER
  3    , name	  VARCHAR2(5)
  4    , category VARCHAR2(26)
  5  );

Table created.

SQL> INSERT INTO lots_o_data VALUES (1, 'Lotso', 'Wow this is a lot of data.');

1 row created.

SQL> INSERT INTO lots_o_data VALUES (2, 'More',  'Bunches and bunches.');

1 row created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

4 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

36 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

1764 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

3261636 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT COUNT(*)
  2  FROM   lots_o_data;
  COUNT(*)                                                               ----------                                                             3263442                                                                      
SQL> SPOOL OFF

Note the following

With only four INSERT … SELECT statements based on a Cartesian product, we were able to generate over 3 million rows of data.
A few more INSERT … SELECT statements could easily result in billions of rows of data.

If you need a lot of data for performance testing or capacity testing, this is an easy way to get however much you require.

I have been working on building a data lake environment in anticipation of a series of large data feeds I’m expecting soon and found this to be a useful approach to preparing the environment.

From Sept 25, 2013: KTVU featured me speaking for OOW

On September 25, 2013, the American racing team led by Larry Ellison retook the America’s Cup in spectacular fashion, after several days and dramatic events. Nobody could have predicted that the grand finish would end up happening in San Francisco Bay, within walking distance of Oracle Open World / JavaOne, and on the climactic “customer appreciation night” of that year’s conference. It was an amazing once-in-a-lifetime event.

KTVU’s Jana Katsuyama featured me in a soundbite speaking on behalf of the very proud attendees of Oracle Open World, see below.

2013_09_25_Americas_Cup_OHearn_KTVU from Steve OHearn on Vimeo.

At first, Jana asked me a series of questions about whether attendees were upset that Larry’s attention was divided between the conference and the race, and that he’d missed a keynote earlier in the day to go celebrate the victory.

I said – no way! The conference is great, everything is well organized and continuing as planned, and who could possibly predict such a stunning situation – good for Larry!

And Jana, to her credit, let the spirit of my comments convey in the edited version that broadcast that evening on the leading Bay area news channel KTVU.

See above!

Rumors of SQL’s Death Have Been Greatly Exaggerated

The “NoSQL” revolution is tremendous on many levels. It’s practical, relevant, and useful. There are plenty of business situations where an architecture optimized for limited, targeted processing of very large amounts of data simply doesn’t need the overhead of a relational tool and can benefit from specialized functionality optimized for select purposes.

But SQL is still critical to the core operations of any enterprise. Contrary to what many new-comers might think, “NoSQL” in a given application doesn’t actually mean “No SQL” across the board in the enterprise. A lot of us who are responsible for enterprise architectures think of the expression “NoSQL” as really saying “Not Only SQL”. After all – some NoSQL tools are actually built on a SQL platform. Believe it or not, the traditional relational database is still the number one technology for handling the mission critical core services of any professional organization.

However, this truth is apparently lost on some recent arrivals in the world of software development.

About a year ago I was invited to attend a conference in Maryland of one of the leading big data vendors. Those of us attending were primarily software developers, and we all spoke in terms of various Apache open source projects and their commercial counterparts. We rarely uttered the word “Java”, yet we were all Java developers of one kind or another. And those of us in the world of Java development know better about the benefits and relative importance of both NoSQL and SQL platforms.

But I’m describing the attendees. The conversation I’m about to share was with one of the vendors who was exhibiting at the conference. (Yes, a commercial vendor was hosting a show at which they invited other commercial vendors, it happens all the time.) He was trying to sell his company’s product, of course, a product that was complementary to the host company’s core offerings. His company was one of those now-familiar business models where some developers created a cool open source project that they promoted to the Apache Software Foundation, eventually winning full project status. With that established, the developers formed a company to begin selling support for the open source ASF product, and also began building a proprietary alternative for sale. There are dozens of companies doing this nowadays, as we all know – and it’s a great business model, I love it. More power to them.

So here I was chatting with a sales rep hawking his particular product. Suddenly he declared the following:

“Well really, relational databases are dead, nobody uses those any more.”

I stopped. The first visual that popped into my head was of that time as a kid when one of your friends tell you that “nobody” does this or “everybody” does that. But as you get older, you learn that your fellow six year olds don’t necessarily have a handle on what the whole world is doing.

I’m sure this guy was a smart guy. And his company was certainly offering a valuable product. But some perspective was in order.

So I said – really? The RDBMS is dead? Let me ask you some questions. Don’t you all sell this product?

Him: Yes.

Me: Well – don’t you track sales in a database? Orders, items, linked to the catalog? You probably track orders in an RDBMS.

Him: Well … ok, yes, probably.

Me: And the customers placing those orders? You need data about any number of contact methods, like phone numbers, email addresses, where to ship, when to call, etc., all connected to the orders, sales records, etc. Probably all structured for mass mailings by correct categories at the right time. Sounds like an RDBMS. Right?

Him: OK, probably, yes.

Me: So … financial data? Employee info and HR compliance, other regulatory issues? Is that going to work in a NoSQL environment?

Him: Good point, probably not.

Me: So the RDBMS is tracking company orders, items, products, customers, employees, suppliers, accounting, finance, human resources, marketing contacts, contracts with events like this conference, and all other data associated with dollars, the law, whatever must be accurately tracked in line with the rest. Even online blog posts are probably running on an RDBMS, right? So that leaves, what, web log files and online comments for marketing trend analysis?

Him: Well no, more than that … but I see your point.

Me; NoSQL has its place. So does SQL. Both are valuable.

Him: Agreed.

One down.

Lots more to go.

– Steve

P.S. A huge thank you to Steve Feuerstein for encouraging this particular blog post, which is a direct result of a little exchange we had recently on this topic. Thanks, Steve!

The latest happenings at the Baltimore Washington Java Meetup

I just sent out an email to the illustrious members of my Java User Group (JUG), aka the Baltimore Washington Java Meetup. The text is below. All those interested in Java technology are invited and welcome!

Well well well! What an AMAZING night we had for our November meetup! I’ve been remiss to have not sent this sooner – things have been busy busy busy in Steve world – but you should know that November 10th was an amazing night. We had about a billion people there – well, ok, perhaps it wasn’t a billion, but in all seriousness we maxed out the room – and it was all to hear the most excellent presentation by our very own rock star, Markus Dale, and his Overview of Amazon Web Services (AWS) and its Java API. On hand were many illustrious members of our meetup, plus some additional celebrities, namely AWS’s Walt Whitecotton, John Peterson, and John Quarentello, who brought plenty of Amazon swag with them as giveaways and door prizes.

Speaking of the door prizes … the questions were:

1) Do you use (circle all that apply) (a) Ant, (b)Maven, (c)Jenkins, (d) Hudson?

Every answer was selected and there was a write-in or two for Gradle. But the hands-down leading choice was Maven.

This question was posed at JavaOne to a room of about 200 developers. By a show of hands, about half used Jenkins, half Maven, and two people use Ant.

2) An annotation can be extended: (a) True (b) False

The correct answer is false! Now … should they be extendable? Well that’s a different question altogether. But in the meantime –

Congratulations to all the winners, especially Peter Stamboulis who won the 1st place price, a gift card for Amazon Web Services!

Most of all, a great big thank you and “bravo!” to Markus Dale, who clearly knows his stuff and did a fantastic edge-of-your-seat presentation on one of the hottest cutting-edge topics in the IT field today.

SO … our next meetup is … TONIGHT! As we’ve been saying this fall, our December meetup would be non-standard, not a technical session, and that’s definitely the case this evening! We are simply meeting for dinner at a great restaurant in Columbia – pay your own way – click here for details.

In 2016 we’ll resume with technical sessions and already have a few in the works: Lambdas, data streams, new features in Java 9, our own participation in the Adopt-A-JSR effort, and more!

But for tonight, we’re having a fun holiday dinner (pay your own way!), and everyone is invited!

See you tonight!

– Steve

My SQL Expert Book and Practice Exams

I recently read in an online forum the online bonus exam for my book, OCA Oracle Database SQL Certified Expert Exam Guide (Exam 1Z0-047), is “no longer available” at the Oracle Press website. I’m inquiring into this to determine if its true, and if so, to see if we can get it restored. If you’ve had a similar experience, please email me to let me know.

In the meantime:

I have created additional practice exams here: http://www.databasetraining.com. They are available at no charge, and only require that you registered an email address so you can take the exams in multiple login sessions if you wish.
I am currently working on a revision to my book, to upgrade it for Oracle 12c. Lots of new feature we’re adding, and I’ll blog about those soon. The second edition will be completed in early 2016.

In the meantime, if you’re looking for a practice exam, please check out DatabaseTraining.com!

Java, SQL, and the ORM Mismatch

I just can’t take it any more.

There is an incorrect perception floating around out there in Java world about the issue of mapping Java classes to SQL tables. This issue falls within a larger topic often referred to as the object-relational mapping (ORM) mismatch. This “mismatch” is based on the fact that Java is object-oriented, and SQL is not, so using Java classes to represent SQL tables involves some issues, and requires some clever work. That is true. But it’s led to a major misperception about SQL being supposedly “defective”. It is not, but this notion has crept its way into two books I’ve read lately, books by otherwise very smart people, published by major cutting edge IT book publishers.

(For the record, no, neither publisher in question is my own publisher, the incomparable McGraw-Hill Education and their awesome imprint with Oracle Corporation, Oracle Press.)

I’m not going to verbally quote either work, but the idea goes like this:

To support persistence (storing data), you map a Java class (ex., “Customer”) to a corresponding SQL table (ex., “CUSTOMERS”)
When creating two Java classes, it’s possible to design them with a many-to-many relationship to each other
SQL doesn’t support the creation of tables with many-to-many relationships
Therefore SQL is defective.

Uh – no.

Folks, the presence of a many-to-many relationship in your data model is a sign that you aren’t finished with your data model.

This isn’t a Java thing, or a SQL thing, it’s a real world thing.

It’s possible to build a defective data model in Java. You can do a lot of stupid stuff in Java. You can also build bad data models in SQL, and technically you actually can create a many-to-many in SQL, but it won’t work. That’s a good thing. I’ll explain shortly with an example.

But first, a word about the classic mismatch.

THE ORM MISMATCH

The authors of these two books I’ve read are Java developers first and foremost. I get that, and I know where they’re coming from. I’m a Java developer myself, and have been since Java was first introduced into the Oracle database as an alternative to PL/SQL, which I also love – so much that my first book was the official Oracle Press certification exam guide OCP Developer PL/SQL Program Units Exam Guide.

But about the time that book hit the shelves, I began working with Java, which had become available as an in-database alternative to PL/SQL. I soon thereafter began teaching Java. In fact, this is me with then-CEO of Sun Microsystems, on the first day I launched my first ever course in Java:

Scott McNealy, Steve O'Hearn, at the National Press Club

That was taken at the National Press Club. McNealy was hysterically funny that day and a great guy. He’s responsible for, and oversaw, a lot of great developments in Silicon valley that reverberate to this day – like Java.

And back at that time, the first question about Java was – how do we get Java and SQL to interact? That, after all, was Oracle’s whole point of embedding Java in the database.

Java has come a long, long way since then, and all for the better. There is a well established and growing library of packages and tools for implementing various interactions between Java and SQL. And at the heart of them all is often the same issue:

Java is object oriented, and SQL isn’t.

True. So what? Newcomers to Java act like this is some kind of ultimate nightmare scenario, a sign that SQL is out of sync with the world. Nonsense. SQL isn’t inherently object-oriented, but neither is your file system. And yet Java interacts with files just fine. Now granted, the idea behind SQL is more sophisticated than file systems. But on the other hand, SQL isn’t in the same category as a third-generation language (3GL) like Java, SQL is a fourth generation language (4GL). It might not support object-oriented dynamics, but neither does it clash with them. This is demonstrably obvious – otherwise we wouldn’t have JDBC, the Java Persistence API (JPA), and Hibernate. So the two can and do work together, and effectively. But it takes a bit of work; SQL doesn’t necessarily provide built-in support for all object-oriented concepts, such as inheritance. (Caveat: Inheritance actually can be supported in data modeling – see this article – and there’s a way to do it within SQL itself, sort of, but as to whether it’s helpful or not, that’s a different story, and perhaps the topic of a future blog post. Or book. Hm …. )

Frankly, when I hear about the “mismatch” between Java and SQL, I think of the issue of transactions, that’s the only real issue to me. The object life cycle and nature of Java is such that it introduces challenges in the way persistence to the database is best done in a multi-user/multi-threaded environment.

But let’s get back to topic and dispel this “SQL is defective” myth.

DATA MODELING

Data modeling is the act of representing real world business processes through diagrams of the things (“entities”) that comprise a real world process, and the relationships (or “associations”) among those entities. The most obvious entities are easy to identify – customers, products, office buildings, vendors, cars, etc. But the more abstract entities are not always so obvious – work schedules, change orders, reservations, that sort of thing. This is where the importance of understanding many-to-many relationships can be very helpful.

AN EXAMPLE

I’m going to use an example taken from the airline industry, as I’ve been meeting lately with Lisa Ray, a lifelong aviation data expert closely involved with a series of legacy migrations in that area. (And for the record, she fully gets this.)

Let’s start with two obvious entities:

Planes – as in “individual planes”, not just types of planes.
Pilots – individual people.

Every one plane might be flown by more than one pilot. Every one pilot will be qualified to fly more than one plane. Sounds like a many-to-many relationship, right? And as you might know, an entity-relationship diagram illustrates the “many” side of a relationship with the classic crow’s-feet line:

ERD Diagram A

So that’s our initial logical diagram for these two entities.

Now folks, this blog post is not going to be a complete tutorial on how to perform data modeling, but we will explore one key aspect of modeling: whenever you encounter a many-to-many relationship, you ALWAYS transform it using the following steps:

Add a third entity in between – “Entity_3” in our diagram below.
Create foreign keys within the new entity for the primary keys of the existing entities – in this case, PLANE_ID as a foreign key to the PLANES entity’s PLANE_ID primary key, and PILOT_ID as a foreign key to the PILOTS entity’s PILOT_ID primary key.
Establish “one-to-many” relationships between the new entity and the existing entities, using the “single-line” to indicate the “1” side, and the “crow’s foot” to indicate “many” side of the relationship.

Like this:

ERD Diagram B

The reason we do that is simple: the presence of a many-to-many relationship is a clear indicator that something else goes in between the two entities. The question is not “if” something belongs there. Something does belong there. The only question is – what exactly is it?

In our example, it’s going to be something like a “roster”, or a “schedule”, or “flight assignments”, or “rentals”:

ERD Diagram C
Call it what you will, but something goes in between and it must be included in your data model. That new entity will have its own attributes. Perhaps:

A start and end date to the time of assignment
A name of a key staff member authorized to approve the assignment

Who knows what might be there? But something is there, and the presence of the many-to-many relationship is your clue to its existence.

That third entity is often a bit more abstract. You can see pilots and planes, but a “flight assignment” isn’t necessarily as obvious. But it’s an important entity in the data model nonetheless.

One more point: this third entity isn’t some necessary crutch to get around a “SQL defect”. I can’t believe what I’ve read in certain Java books that suggest such a thing. No. This third entity is a real world “thing” that definitely exists in order to enable the relationship with the real world, and it’s up to you to identify it. Whether you figure it out or not, it exists nonetheless.

The rule is simple: no complete data model exists with any “many-to-many” relationships. If such are present, you aren’t done modeling yet.

So now that you understand this, imagine reading a Java manual in which someone explains that SQL is “defective” because it “doesn’t support many-to-many relationships”.

No.

Now you know!

O’Hearn is the Java leader of the only officially recognized JUG for the DC metro area. He is also the author of the first-ever expert-level certification exam guide from Oracle Press, titled OCA Oracle Database SQL Certified Expert Exam Guide (Exam 1Z0-047). The 2nd edition of his SQL Expert book, revised for Oracle 12c, is due out in 2016.