Python and SQL

Is it me?  Or is there a lot of buzz about Python lately?  Python is a great language for doing certain work quickly.  It supports a variety of software design approaches.  It is interpreted, and yet is object-oriented.  You can use it like an interactive calculator, and it comes with powerful math libraries.  It’s succinct – you can do a lot with fewer lines of code than is typical, but it can be used for large scale programming as well.  For these reasons and more, Python is growing in popularity, particularly in the fields of data analytics and data science.

It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.

It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.

Getting Ready

I downloaded and used the following software:

Python http://www.python.org 3.5.2 Popular software language!
SQLite http://sqlite.org 3.15.2 Embedded server-less SQL database
Oracle http://www.oracle.com 12c The world’s leading relational database management system
cx_Oracle http://cx-oracle.sourceforge.net/  5.2.1 Python extension module

I downloaded and installed Python and Oracle.  I also downloaded SQLite and cx_Oracle for use with Python.

Python and SQLite

SQLite is an embedded SQL tool.  It doesn’t have a separate server process, but is easily incorporated into applications. Here’s an example of Python code that uses SQLite, version 3:

import sqlite3
connection = sqlite3.connect("ships.db")
cursor = connection.cursor();
sql_statement = """CREATE TABLE ships
                  (    ship_id INTEGER
                     , ship_name TEXT);"""
cursor.execute(sql_statement)
sql_insert = """INSERT INTO ships VALUES(1, 'Codd One');"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships VALUES(2, 'Codd Two');"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships VALUES(3, 'Codd Three');"""
cursor.execute(sql_insert)
connection.commit()

The code above will successfully create a SQLite database file “ships.db”, then create table “ships” and then add three rows to it, and COMMIT the records.

In a subsequent script, you can query the same database.

import sqlite3
connection = sqlite3.connect("ships.db")
cursor = connection.cursor();
sql_statement = """SELECT ship_id, ship_name FROM ships;"""
cursor.execute(sql_statement)
print("fetchall:")
result = cursor.fetchall()
for r in result:
print(r)

The rows returned by the SELECT statement are processed within the “for” loop.  “Result” captures the result of the query, and “r” serves as a fetched value for each row.  In this case, each row represents an array of two values, and both values are displayed with the “print(r)” statement.

The output:

fetchall:
(1, 'Codd One')
(2, 'Codd Two')
(3, 'Codd Three')
>>>

Note:  the “>>>” symbol is the Python cursor, indicating the interactive system is awaiting the next instruction.

Python and Oracle

Now let’s use Python to send SQL statements to Oracle 12c. To establish the connection, we use the Python extension module “cx_Module”, available from SourceForge.net – see above for the links to all the software necessary to run these scripts.

Also – note that while SQL strings submitted directly to Oracle require semi-colons, SQL statements sent via cx_Oracle reject them.

In this example, we’ll do the same thing we did with SQLite, with one change: instead of breaking up the CREATE/INSERT and SELECT statements into two scripts, we’ll do it all in one script:

import cx_Oracle
connection = cx_Oracle.connect('username/password@localhost/orcl')
print("Oracle RDBMS version " + connection.version)

cursor = connection.cursor()

sql_statement = """DROP TABLE ships2"""
cursor.execute(sql_statement)

sql_statement = """CREATE TABLE ships2 
                   (   ship_id NUMBER
                     , ship_name VARCHAR2(20))"""
cursor.execute(sql_statement)

sql_insert = """INSERT INTO ships2 VALUES(1, 'Codd Crystal')"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships2 VALUES(2, 'Codd Elegance')"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships2 VALUES(3, 'Codd Champion')"""
cursor.execute(sql_insert)

connection.commit()

cursor.execute("SELECT ship_id, ship_name FROM ships2")
print("fetchall:")
result = cursor.fetchall()
for r in result:
print(r)

connection.close()

print("End of script, database connection closed.");

The output of this script is the same we saw earlier:

fetchall:
(1, 'Codd One')
(2, 'Codd Two')
(3, 'Codd Three')
>>>

Regardless of which database we use, the results are the same.

Closing Thoughts

In this blog post, we looked at how to use Python to execute SQL statements.  We created SQL objects, and sent data to the database, as well as queried data from the database.  We passed data to and from Python structures and set t he stage for any data manipulation we wish to do using these powerful tools.

Python is easy to use, and capable of performing complex math functions and data manipulation.  Using SQL to persist data for Python is an ideal combination for data analytics.

NOTE: This data and the accompanying scripts are available at my GitHub site: https://github.com/Skere9.

GROUP BY on a GROUP BY

Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates.  I can’t present the actual data I’m working with here, but this is an illustrative example.  Consider the data set below.

SQL> SELECT   *
  2  FROM     shop_orders;

        ID CATEGORY   ITEM_N
---------- ---------- --------------------
         1 Office     Stapler
         2 Fruit      Banana
         3 Bread      French Bread
         4 Steak      Ribeye
         5 Fish       Bass
         6 Crackers   Saltines
         7 Fish       Swordfish
         8 Fruit      Pear
         9 Bread      Rolls
        10 Crackers   Chips
        11 Steak      Prime Rib
        12 Bakery     Birthday Cake
        13 Fish       Salmon
        14 Fish       Flounder
        15 Cleaners   Detergent
        16 Bakery     Donuts
        17 Office     Ink Cartridges
        18 Steak      Filet Mignon
        19 Vegetable  Broccoli
        20 Steak      Flank
        21 Fruit      Raspberry
        22 Sundries   Shampoo
        23 Steak      Salisbury
        24 Vegetable  Radish
        25 Vegetable  Mushroom
        26 Pharmacy   Band-aids
        27 Pharmacy   Aspirin
        28 Sundries   Toothbrush
        29 Sundries   Shampoo
        30 Cereal     Bran
        31 Cereal     Corn Flakes
        32 Cleaners   Soap
        33 Cereal     Oatmeal

33 rows selected.

One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?

To determine this, we can start with a GROUP BY, like this.

SQL> SELECT   category, COUNT(*) CT
  2  FROM     shop_orders
  3  GROUP BY category;

CATEGORY       CT
---------- ----------
Cereal              3
Fruit               3
Crackers            2
Vegetable           3
Office              2
Steak               5
Pharmacy            2
Fish                4
Bread               2
Cleaners            2
Sundries            3
Bakery              2

12 rows selected.

Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.

SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n
  2  FROM
  3    ( SELECT   a.category, COUNT(a.item_n) CT
  4  	 FROM	  shop_orders a
  5  	 GROUP BY a.category
  6    ) b
  7  GROUP BY b.ct
  8  ORDER BY b.ct DESC;

CT_CATEGORY  CT_ITEM_N
----------- ----------
          1          5
          1          4
          4          3
          6          2

Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.

In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.

If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.

DROP   TABLE shop_orders;
CREATE TABLE shop_orders
(   id       NUMBER
  , category VARCHAR2(15)
  , item_n   VARCHAR2(20)
);
INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler');
INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana');
INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread');
INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye');
INSERT INTO shop_orders VALUES ( 5, 'Fish',  'Bass');
INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines');
INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish');
INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear');
INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls');
INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips');
INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib');
INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake');
INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon');
INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder');
INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent');
INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts');
INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges');
INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon');
INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli');
INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank');
INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry');
INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury');
INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish');
INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom');
INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids');
INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin');
INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush');
INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran');
INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes');
INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap');
INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal');

COMMIT;

There you go! Have fun!

Generating Test Data

Here’s a quick way to generate dummy data for performing tests that require lots of record.

SQL> DROP   TABLE lots_o_data;

Table dropped.

SQL> CREATE TABLE lots_o_data
  2  (	 id	  NUMBER
  3    , name	  VARCHAR2(5)
  4    , category VARCHAR2(26)
  5  );

Table created.

SQL> INSERT INTO lots_o_data VALUES (1, 'Lotso', 'Wow this is a lot of data.');

1 row created.

SQL> INSERT INTO lots_o_data VALUES (2, 'More',  'Bunches and bunches.');

1 row created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

4 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

36 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

1764 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

3261636 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT COUNT(*)
  2  FROM   lots_o_data;
  COUNT(*)                                                               ----------                                                             3263442                                                                      
SQL> SPOOL OFF

Note the following

  • With only four INSERT … SELECT statements based on a Cartesian product, we were able to generate over 3 million rows of data.
  • A few more INSERT … SELECT statements could easily result in billions of rows of data.

If you need a lot of data for performance testing or capacity testing, this is an easy way to get however much you require.

I have been working on building a data lake environment in anticipation of a series of large data feeds I’m expecting soon and found this to be a useful approach to preparing the environment.

“We sacrificed a lot for this country …. don’t take it for granted.”

Ten years ago I attended the grand opening of the National World War II Memorial in Washington, DC.  I attended a “canteen” in the morning hosted by John Cosgrove, and then attended the formal opening ceremonies, followed by the opening of the memorial itself. While there, I watched as two World War II veterans as they met each other for the first time: Bob Wallace, and Gordie Eaves, both double-amputees as a result of their landing at Normandy in 1944.

I took the first photo of them (below) at 5:36 p.m.

The Washington Post interviewed them for an article that was published the next day. I met them afterwards and took the second picture (below) at 5:46 p.m in the evening.

In the words of Gordie: “I was in the ETO theatre as a Medic in the Infantry, across northern France into Germany/ Landed at Omaha beach at the same spot that Bob did on ‘D’ Day but much later on.”

The following paragraphs are what I wrote at the time about what happened next.

Right after I took this picture was a moment I’ll never forget. Gordie reached out to me and took my right hand, and pulled me in a bit closer. He and Bob both looked at me, and Gordie spoke, and with the most disarming frankness I think I’ve ever experienced, he said “We sacrificed a lot for this country. We won’t live much longer. It’s yours now”.

I was floored. I looked at them both, and Bob was looking just as intently at me as Gordie was. It was as if these two were completely like-minded even though they had just met fifteen minutes earlier.

Gordie continued about our country, and said something like “Don’t take it for granted. Take good care of it.”

I was speechless. And I’m never speechless. (You can probably get a sense of that from all my droning on within this web page.)

I didn’t know what to say, but I eventually coughed up a “yes, sir, I will, sir.”

I’ll never forget that moment as long as I live.

Thank you, Gordie and Bob.

I took dozens of photos that day and posted them all, along with a narrative of the day’s events, online at this web page:

National World War II Memorial
Dedication Weekend
May 2004

At that page you’ll see photos of many events, including pics of Doc Scantlin and Chou Chou, who performed that weekend.

Bob was from Arkansas, and Gordie lived in Florida. I stayed in touch with them in the years that followed. Bob passed away a few years later, and Gordie and I exchanged sorrowful notes about it.  Gordie passed away in 2007.

But their legacies live on, and I’ll never forget that day, nor Gordie’s words, which clearly spoke for them both:

“We sacrificed a lot for this country. We won’t live much longer. It’s yours now. Don’t take it for granted. Take good care of it.”

Insanity Island featured at the Liberty Alliance home page

Insanity Island featured at Liberty Alliance

The momentum behind Doug Hayden‘s Insanity Island is growing!  The latest example:  Liberty Alliance, a large collection of influential organizations and websites advocating traditional values and conservative principles, is featuring Insanity Island’s latest brilliant piece on its home page.  The latest Insanity Island cartoon is currently the number one “featured post” at Liberty Alliance’s home page.

You owe it to yourself to see the video for yourself, here it is:

It’s a full time job just keeping up with our Skere9 Stars, and this one is not exception!

DHS Warning About Java: Update

Several days ago I wrote a blog post about the warning from the U.S. Department of Homeland Security (DHS) about the use of Java.  DHS had gone so far as to advise users to disable Java in their browsers.

A few words of clarification:  as many of you surely know, Java is not JavaScript.  Those are totally different and unrelated languages, in spite of the unfortunately similar wording.

Second, by warning users to turn of Java in their browsers, DHS is essentially telling them to disable Java applets, which are the only form of Java programs that can run in a browser.  Applets are small programs that exist on websites, and that download to your browser, and execute on your own local computer, all within a container that is intended to prevent it from doing anything to your computer without your express permission.  Generally the only thing an applet is allowed to do is present data visually and accept typed or other forms of input from the end-user.  Frankly, that’s not how most Java-based systems work today.  Most Java-based systems consist of Java programs that run elsewhere – within mobile devices, or on servers in all sorts of forms.  Lots of websites use Java on the server side and never send applets to their end-users browsers.  So you may still be visiting a website that runs Java on the server side, and that never sends executable code to your browser, it probably only serves up completed web pages, and that’s fine, you’ll be safe insofar as the DHS warning is concerned.

So now it’s February 3, 2013 (Super Bowl Sunday incidentally), and yet – still no apparent conclusion to the Java situation.

Technology News Logo

The most recent article I can find right now is Taking the Java Bull by the Horns by Patrick Nelson at Technology News, published Jan. 31, 2013, and it says this:

even though Oracle has made some efforts to patch the flaws, DHS hasn’t lifted its warning … As of Jan. 22, 2013, the current version of Java is Version 7, Update 11. The latest version includes fixes for issues raised by DHS as well as other issues. It also sets security settings to “High.” … You may decide that it’s prudent to switch off Java altogether. New Java vulnerabilities are likely to be discovered, according to DHS’s Computer Emergency Readiness Team.

The article also includes step-by-step instructions for performing upgrades and adjusting security settings in your browser.

We’ll keep an eye on this.  I think many of us are so busy working in non-applet areas that we’re not all that concerned.  However, I know firsthand of one company that internally uses an applet-drive software tool for internal corporate communications, and they’ve recently made the call to shut it down until this issue is resolved.  It’s disruptive for sure.

Stay tuned.

Oracle and Java – A Good Brew

Scott McNealy, Steve O'Hearn, February 8, 2001, National Press ClubI first met Scott McNealy on Thursday afternoon, February 8, 2001, at about 12:15 in the afternoon.  Not that I was keeping track or anything.  But I was starting to teach a new Java course that night, based on curriculum I’d created, so the timing was particularly striking to me personally.  McNealy was the CEO of Sun Microsystems, creator of Java. McNealy was the featured speaker at the National Press Club’s luncheon that day.  I’d brought a couple of professional colleagues and friends to meet him.

The Java programming language was already important to Oracle systems, which was why my company was launching the new course.  Our company, db-Training, already taught coursework in Oracle development and database administration.  We were very early to recognize the significance Java would have to the Oracle database.  There was a key architectural similarity between the two products that was rare among the various competing products of the time.  I’ll explain.

The Oracle Relational Database Management System (RDBMS) wasn’t – and isn’t – just a database.  It’s an operating platform.  This fact cannot be overstated, especially to those who haven’t worked with Oracle.  The RDBMS stores data and houses software snippets, in a container that can operate on any operating system.  So if you created a database application in the Oracle RDBMS – and I did, quite a few of them – you could do so without a concern for what operating system your application may end up running on.  The trick was that you just moved the application into a version of the RDBMS designed for that new operating system.

But if you created a database application in, for example, traditional COBOL or FORTRAN or something else, you had to concern yourself with the operating system in which you were going to execute the program.  Is it Microsoft Windows?  Which version? Is it Unix?  Or something else – an Apple computer of some sort?  For each new environment, you would have to recompile your FORTRAN or COBOL or whatever program, being aware that the compilation may result in errors, even if it had compiled perfectly well on another operating system already.

With Oracle, you didn’t have to bother with that nonsense.  Create your app in the Oracle database, and you’re done – it’ll work on any operating system, you just have to move the app to the appropriate RDBMS.

However – you couldn’t use FORTRAN (well, ok, you could, but just go with me on this, you Pro*Fortran people).  You had to use an Oracle proprietary language called PL/SQL.

I love PL/SQL.  I love it so much, I wrote a book about it.

Then along came Java, and its Java Virtual Machine.   I think Oracle Corp was quick to realize that the JVM was to Java what the Oracle RDBMS was to database applications – a buffer between the Java program and the operating system, which meant that you could write a single application in Java, and then move it from operating system to operating system by swapping your existing JVM with one that fit the new operating system.  You didn’t have to recompile or rewrite code.  You just had to do a one-time installation of the JVM at the new location, that’s it.

Just like the Oracle RDBMS.

When Oracle acquired Sun Microsystems, a lot of conspiracy theorists rolled their eyes and figured Oracle would destroy Java somehow.  I was pretty confident Oracle wouldn’t do that.  It made good business sense for them to support Java.

I was there at Oracle Open World when Scott McNealy spoke.  You could tell he was impacted powerfully by the acquisition of Sun by Oracle.  I don’t think it was his preference, but rather an economic necessity.  I’m sure he would’ve kept Sun independent if possible. But since he had to sell, Oracle was the best way to go.

And now – history has shown that Oracle has been very good to Java, according to InfoWorld.

I could’ve told you that years ago.  Oracle needs Java.  It was a good fit, and still is.

 

 

More Praise for “Oracle SQL Expert” – Trent See

OCA Oracle Database SQL Certified Expert by Steve O'Hearn from Oracle PressI’ve been exchanging emails with a great Oracle professional named Trent See.  Trent originally emailed me back in October of 2010, but stupid me was so busy I never got around to writing back to most of the email I received on the book until recently.  So I only recently asked Trent for permission to quote him, and he just granted me that permission this week.

Here’s an excerpt of Trent’s outstanding original email:

“Thank you for writing the OCA Oracle Database SQL Certified Expert Exam Guide (Exam 1Z0-047) … I’m so glad I bought the book!  There is no way I would have been prepared to take THAT exam with my working knowledge … Thank you Steve!  I have learned more new SQL techniques reading your book in the past month and a half than I have over the past 9+ years!”- Oct. 12, 2010

I’ve added his quote to the praise page for the book.  Trent also provided a few items that I’ve included in the book’s Errata sheet, and I’m crediting him for the those.

Thank you Trent!  And welcome to the distinguished ranks of certified Oracle SQL experts!