Python and SQL

Is it me?  Or is there a lot of buzz about Python lately?  Python is a great language for doing certain work quickly.  It supports a variety of software design approaches.  It is interpreted, and yet is object-oriented.  You can use it like an interactive calculator, and it comes with powerful math libraries.  It’s succinct – you can do a lot with fewer lines of code than is typical, but it can be used for large scale programming as well.  For these reasons and more, Python is growing in popularity, particularly in the fields of data analytics and data science.

It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.

It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.

Getting Ready

I downloaded and used the following software:

Python http://www.python.org 3.5.2 Popular software language!
SQLite http://sqlite.org 3.15.2 Embedded server-less SQL database
Oracle http://www.oracle.com 12c The world’s leading relational database management system
cx_Oracle http://cx-oracle.sourceforge.net/  5.2.1 Python extension module

I downloaded and installed Python and Oracle.  I also downloaded SQLite and cx_Oracle for use with Python.

Python and SQLite

SQLite is an embedded SQL tool.  It doesn’t have a separate server process, but is easily incorporated into applications. Here’s an example of Python code that uses SQLite, version 3:

import sqlite3
connection = sqlite3.connect("ships.db")
cursor = connection.cursor();
sql_statement = """CREATE TABLE ships
                  (    ship_id INTEGER
                     , ship_name TEXT);"""
cursor.execute(sql_statement)
sql_insert = """INSERT INTO ships VALUES(1, 'Codd One');"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships VALUES(2, 'Codd Two');"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships VALUES(3, 'Codd Three');"""
cursor.execute(sql_insert)
connection.commit()

The code above will successfully create a SQLite database file “ships.db”, then create table “ships” and then add three rows to it, and COMMIT the records.

In a subsequent script, you can query the same database.

import sqlite3
connection = sqlite3.connect("ships.db")
cursor = connection.cursor();
sql_statement = """SELECT ship_id, ship_name FROM ships;"""
cursor.execute(sql_statement)
print("fetchall:")
result = cursor.fetchall()
for r in result:
print(r)

The rows returned by the SELECT statement are processed within the “for” loop.  “Result” captures the result of the query, and “r” serves as a fetched value for each row.  In this case, each row represents an array of two values, and both values are displayed with the “print(r)” statement.

The output:

fetchall:
(1, 'Codd One')
(2, 'Codd Two')
(3, 'Codd Three')
>>>

Note:  the “>>>” symbol is the Python cursor, indicating the interactive system is awaiting the next instruction.

Python and Oracle

Now let’s use Python to send SQL statements to Oracle 12c. To establish the connection, we use the Python extension module “cx_Module”, available from SourceForge.net – see above for the links to all the software necessary to run these scripts.

Also – note that while SQL strings submitted directly to Oracle require semi-colons, SQL statements sent via cx_Oracle reject them.

In this example, we’ll do the same thing we did with SQLite, with one change: instead of breaking up the CREATE/INSERT and SELECT statements into two scripts, we’ll do it all in one script:

import cx_Oracle
connection = cx_Oracle.connect('username/password@localhost/orcl')
print("Oracle RDBMS version " + connection.version)

cursor = connection.cursor()

sql_statement = """DROP TABLE ships2"""
cursor.execute(sql_statement)

sql_statement = """CREATE TABLE ships2 
                   (   ship_id NUMBER
                     , ship_name VARCHAR2(20))"""
cursor.execute(sql_statement)

sql_insert = """INSERT INTO ships2 VALUES(1, 'Codd Crystal')"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships2 VALUES(2, 'Codd Elegance')"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships2 VALUES(3, 'Codd Champion')"""
cursor.execute(sql_insert)

connection.commit()

cursor.execute("SELECT ship_id, ship_name FROM ships2")
print("fetchall:")
result = cursor.fetchall()
for r in result:
print(r)

connection.close()

print("End of script, database connection closed.");

The output of this script is the same we saw earlier:

fetchall:
(1, 'Codd One')
(2, 'Codd Two')
(3, 'Codd Three')
>>>

Regardless of which database we use, the results are the same.

Closing Thoughts

In this blog post, we looked at how to use Python to execute SQL statements.  We created SQL objects, and sent data to the database, as well as queried data from the database.  We passed data to and from Python structures and set t he stage for any data manipulation we wish to do using these powerful tools.

Python is easy to use, and capable of performing complex math functions and data manipulation.  Using SQL to persist data for Python is an ideal combination for data analytics.

NOTE: This data and the accompanying scripts are available at my GitHub site: https://github.com/Skere9.

Number one again!

I was pleased to see that my SQL Expert book is yet again listed as a number one best seller this afternoon by Amazon.com’s book ranking system for two of the book’s categories:

  • Computers and Technology -> Databases and Big Data -> Oracle
  • Computers and Technology -> Certification -> Oracle

Trailing are excellent books in their own right, including many authored or co-authored by Oracle ACE Directors: a brand new Oracle Public Cloud book co-authored by Charles Kim, Mark Rittman‘s Business Intelligence book from Oracle Press, and Steve Feuerstein‘s legendary PL/SQL book.

I’m busy working on a new book from Oracle Press, more on that in the very near future – stay tuned! In the meantime, the 047 book makes a great holiday present for anyone in your life who wants to become a certified SQL Expert!

Below is the screenshot from the “Databases and Big Data” category:

Oracle SQL Expert Number One on the Amazon Best Selling Charts 11/29/2016 for its category.

Larry Ellison, and “cloud computing” vs. “utility computing”

“Everyone doesn’t have their own well at their house … we tap into the water network, we tap into the electricity network … we get better service at a lower price by having our water come from a utility … we’ll experience the same economic advantages … as we get our data from an information utility.” – Larry Ellison, Oracle Open World, September 18, 2016, spoken during his keynote presentation: “The Cloud: A New Era of Utility Computing”

Don’t be misled.   Larry Ellison is not late to the Cloud Party.  Consider this quote from 1998:

… Larry Ellison, the founder and CEO of the Oracle Corporation, has frequently mused at what life would be like if common household appliances had the same complexity of maintenance as a PC … “Sorry, I can’t go out tonight, I’m staying home so I can upgrade my TV to version 7.0.” … Ellison’s argument … the PC must become as easy to use as any common household appliance.

In a recent presentation … Ellison was asked … if the network will be stable enough—won’t it crash from time to time? … Ellison’s response … what is the last thing that crashed on you: your telephone, or Windows … ? The audience roared with laughter, making the answer obvious …

The number of networks we already depend on is impressive: plumbing, electricity, highways, television, radio—all networks professionally run by others … Why should a computer user experience anything different?

— Excerpted from Chapter 27, “Oracle Web Application Server”, in the book “Oracle8 Server Unleashed” published by Macmillan Computer Publishing.

That’s a brilliant quote, isn’t it?  You better say ‘yes’ because I wrote that – all the way back in 1998.

And I wrote it because of Larry Ellison’s leadership.  Ellison, way ahead of his time, had already seen the potential of the World Wide Web and its underlying protocol, HTTP, all of which had been invented barely five years prior.  Ellison had already seen the vision and was leading the charge towards “utility computing”.

So I was thrilled to see Ellison keeping that phrase alive yesterday at OOW.  To me, “utility” computing is significantly more descriptive than the phrase “cloud” computing.  Clouds paint images of vague, nebulous, fragmented things somewhere far away, without specificity or form or shape.  But utility computing is much more descriptive of what is really intended: a ubiquitous workhorse that is simultaneously both specific and dependable, scientifically complex at it core, cutting edge in its usefulness, yet easy for everyone to access and use for common purposes.

Cloud computing is not just a “computer somewhere else”, as I’ve heard it described – that’s a funny one-liner and I’ve been guilty of laughing at it myself.  But cloud computing – utility computing – is significantly much more than that.  It is a series of complex hardware, network, and software components, brought together in an easy-to-configure, easy-to-access, ever-present simple interface through which an end-user can quickly set up (provision) whatever resources he or she needs quickly to meet whatever business requirement is currently demanding attention, in a more cost-effective manner than ever before, by compartmentalizing the various desired components and optimizing their deployment.

It is true that Amazon Web Services has much of the market momentum at the moment, and has captured a lot of the headlines in the tech industry.

But the truth is that Oracle is still, to this moment, the unquestioned leader in all forms of serious professional business software applications, and he has been actively and purposely working for years to leverage the power of the network in support of business objectives.  The multi-year re architecture of most of the world’s leading enterprise resource planning (ERP) applications is just one example of what they’ve done toward this goal.

This week’s Oracle Open World should be very eventful.

For Larry Ellison’s keynote last night, see this link, but recognize this link might not survive much longer than OOW 2016: https://www.oracle.com/openworld/on-demand/index.html

For an OOW YouTube channel highlight that might survive the link above, see here: https://youtu.be/S1p_TcS9bxk

 

Rio Mid-Games: Gold Medals Per Capita So Far

Four years ago I blogged about the fact that medal counts are often reported on a per-country basis without regard for the population of any country. I’ve always thought that approach to be silly. The per-capita medal count is much more interesting to me.

So this morning I used Bing to get the latest count by country of gold medals at the ongoing Rio Olympics.  I included only those nations that have won two or more gold medals.

I then visited Wikipedia to get recent population counts for those nations.

I used all this data to determine the per-capita win per nation and ranked them.  The results are below.

Ranking Country Per Capita Ratio
(Descending Order)
Gold Medal Count Population Date Population Reported
1 Jamaica 7.34418E-07 2 2,723,246 31-Dec-14
2 New Zealand 6.37423E-07 3 4,706,450 16-Aug-16
3 Hungary 6.10811E-07 6 9,823,000 1-Jan-16
4 Croatia 4.77251E-07 2 4,190,669 31-Dec-15
5 Netherlands 3.52389E-07 6 17,026,640 16-Aug-16
6 Australia 2.48343E-07 6 24,160,100 16-Aug-16
7 United Kingdom 2.45738E-07 16 65,110,000 30-Jun-15
8 Switzerland 2.39762E-07 2 8,341,600 31-Mar-16
9 Greece 1.84196E-07 2 10,858,018 1-Jan-15
10 Cuba 1.77952E-07 2 11,239,004 31-Dec-15
11 Belgium 1.76637E-07 2 11,322,674 1-Jun-16
12 Italy 1.31871E-07 8 60,665,551 1-Jan-16
13 South Korea 1.18107E-07 6 50,801,405 1-Jul-16
14 Kazakhstan 1.12656E-07 2 17,753,200 1-May-16
15 Germany 1.10064E-07 9 81,770,900 30-Sep-15
16 France 1.04891E-07 7 66,736,000 1-Jul-16
17 Spain 8.61356E-08 4 46,438,422 1-Jan-16
18 United States 8.01855E-08 26 324,248,000 16-Aug-16
19 North Korea 7.91108E-08 2 25,281,000 1-Jul-16
20 Russia 7.50345E-08 11 146,599,183 1-May-16
21 Uzbekistan 6.33406E-08 2 31,575,300 1-Jan-16
22 Canada 5.53166E-08 2 36,155,487 1-Apr-16
23 Japan 5.51225E-08 7 126,990,000 1-Jul-16
24 Poland 5.20329E-08 2 38,437,239 31-Dec-15
25 Kenya 4.52934E-08 2 44,156,577 1-Jul-15
26 Colombia 4.09887E-08 2 48,793,900 16-Aug-16
27 Thailand 3.04279E-08 2 65,729,098 31-Dec-15
28 Iran 2.51689E-08 2 79,463,100 16-Aug-16
29 China 1.08836E-08 15 1,378,220,000 16-Aug-16
30 Brazil 9.68457E-09 2 206,514,000 16-Aug-16

You could argue that some country populations and/or medal counts are too small to prove anything other than an interesting anomaly.  Or not.  But there’s no question that Hungary and the Netherlands are very impressive, each with six gold medals for an overall per-capita rate that’s dramatically higher than, for example, Great Britain, the United States, and especially China, who – in spite of their very high medal count, is actually very far behind the other nations on a per capita basis.

These are the results presented graphically, which I created using Microsoft Excel, captured with the built-in “snipping” tool in Microsoft Windows: :

The Pareto Line seems to curve most significantly above Hungary and the Netherlands, which have the most striking performance in terms of size of population and count of gold medals.  Also – note how far back China ranks.  They might have a lot of medals, but they also have the largest population in the world.

To me, this sort of analysis is much more interesting than the simple medal count that doesn’t account for population.


Sources:

  • Gold Medal Count as of 8/16/2016: http://www.bing.com/search?pc=COSP&ptag=D073116-A855B9C56E1&form=CONBDF&conlogo=CT3335450&q=gold+medal+count+by+country
  • Country Population Data: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population

Amazon Web Services (AWS) – System Log

Here’s an easy way to review the system log of an EC2 Instance using Amazon Web Service (AWS):

  • Navigate to the Management Console (the starting point of everything in AWS)
  • Click “EC2”
  • Click “Running Instances”
  • Choose the instance you’re interested in by clicking the empty square to its right and turning it solid blue
  • Click Actions -> Instance Settings -> Get System Log

See below for what your screen should look like.

The result will be a pop-up window displaying the system log:

That’s it!

MySQL Migration: from Drupal to WordPress

I just migrated this Skere9 blog from a Drupal implementation at SiteGround to a WordPress instance at Amazon Web Services.  I’ve been a customer of AWS for three years and finally decided to move this blog to the same platform where I’ve been doing other work.  It’s cost-effective and AWS provides significantly greater control over numerous aspects of the deployment.

Note:  I have one author – myself – and I did not migrate any comments with this move.  I will add those later.

The Drupal version is 7.4.1.  The WordPress version is 4.5.3.

To migrate the blog posts, I took the following steps.

  • I temporarily installed phpMyAdmin at the Drupal site to export the “node” and “field_data_body” tables from Drupal.  (Note: this could have also been done using MySQL directly.)
  • I installed phpMyAdmin at the WordPress site to import the Drupal tables.  (Note: this could have also been done using MySQL directly.) I edited the phpMyAdmin config file to restrict access to phpMyAdmin to my own IP address.
  • I executed the script below via phpMyAdmin at the WordPress site to pull data from the imported Drupal tables and insert that data into the WordPress database.
INSERT INTO wp_posts
(       id
      , post_author
      , post_date
      , post_date_gmt
      , post_content
      , post_title
      , post_status
      , comment_status
      , ping_status
      , post_name
      , post_modified
      , post_modified_gmt
      , post_parent
      , guid
      , menu_order
      , post_type
)
SELECT
        a.nid+100
      , 1
      , FROM_UNIXTIME(a.created)
      , FROM_UNIXTIME(a.created)
      , b.body_value
      , a.title
      , 'publish'
      , 'open'
      , 'open'
      , CONCAT('dr-', a.nid+100)
      , FROM_UNIXTIME(a.changed)
      , FROM_UNIXTIME(a.changed)
      , 0
      , CONCAT('http://skere9.com/?p=', a.nid+100)
      , 0
      , 'post'
FROM     node a LEFT JOIN (field_data_body b)
            ON (a.nid = b.entity_id)
WHERE    type = 'blog'
ORDER BY a.nid

A few important notes about the above code:

  • I had fewer than 100 existing posts in the WordPress instance- it was a brand new installation.  This is why I only added 100 to the ID, to create unique identifiers.
  • I transformed the data formats from UNIX to plain text, as required by the source and target tables.
  • I chose to enter each value into the WordPress “postname” field with a “dr-” prefix – for “Drupal”.  But of course that could have been anything, or nothing at all – no prefix is required here.
  • The “_gmt” fields in the target database are being treated by my WordPress implementation as the same as the local time.  Presumably GMT refers to Greenwich Mean Time, comparable to UTC.  I’m in the USA eastern time zone (New York City zone – I’m in Washington, DC), so presumably the offset should be 5 hours but the existing WordPress implementation I have isn’t treating the times differently.  Therefore I didn’t treat them differently either – I made the “created” timestamp the same for the local and GMT entry, and did the same for the “changed” timestamp.

It all worked like a charm.

From Sept 25, 2013: KTVU featured me speaking for OOW

On September 25, 2013, the American racing team led by Larry Ellison retook the America’s Cup in spectacular fashion, after several days and dramatic events.  Nobody could have predicted that the grand finish would end up happening in San Francisco Bay, within walking distance of Oracle Open World / JavaOne, and on the climactic “customer appreciation night” of that year’s conference.  It was an amazing once-in-a-lifetime event.

KTVU’s Jana Katsuyama featured me in a soundbite speaking on behalf of the very proud attendees of Oracle Open World, see below.

2013_09_25_Americas_Cup_OHearn_KTVU from Steve OHearn on Vimeo.

At first, Jana asked me a series of questions about whether attendees were upset that Larry’s attention was divided between the conference and the race, and that he’d missed a keynote earlier in the day to go celebrate the victory.

I said – no way! The conference is great, everything is well organized and continuing as planned, and who could possibly predict such a stunning situation – good for Larry!

And Jana, to her credit, let the spirit of my comments convey in the edited version that broadcast that evening on the leading Bay area news channel KTVU.

See above!

Rumors of SQL’s Death Have Been Greatly Exaggerated

The “NoSQL” revolution is tremendous on many levels.  It’s practical, relevant, and useful.  There are plenty of business situations where an architecture optimized for limited, targeted processing of very large amounts of data simply doesn’t need the overhead of a relational tool and can benefit from specialized functionality optimized for select purposes.

But SQL is still critical to the core operations of any enterprise.  Contrary to what many new-comers might think, “NoSQL” in a given application doesn’t actually mean “No SQL” across the board in the enterprise.  A lot of us who are responsible for enterprise architectures think of the expression “NoSQL” as really saying “Not Only SQL”.  After all – some NoSQL tools are actually built on a SQL platform. Believe it or not, the traditional relational database is still the number one technology for handling the mission critical core services of any professional organization.

However, this truth is apparently lost on some recent arrivals in the world of software development.

About a year ago I was invited to attend a conference in Maryland of one of the leading big data vendors.  Those of us attending were primarily software developers, and we all spoke in terms of various Apache open source projects and their commercial counterparts.  We rarely uttered the word “Java”, yet we were all Java developers of one kind or another.  And those of us in the world of Java development know better about the benefits and relative importance of both NoSQL and SQL platforms.

But I’m describing the attendees.  The conversation I’m about to share was with one of the vendors who was exhibiting at the conference.  (Yes, a commercial vendor was hosting a show at which they invited other commercial vendors, it happens all the time.)  He was trying to sell his company’s product, of course, a product that was complementary to the host company’s core offerings.  His company was one of those now-familiar business models where some developers created a cool open source project that they promoted to the Apache Software Foundation, eventually winning full project status.  With that established, the developers formed a company to begin selling support for the open source ASF product, and also began building a proprietary alternative for sale.  There are dozens of companies doing this nowadays, as we all know – and it’s a great business model, I love it.  More power to them.

So here I was chatting with a sales rep hawking his particular product.  Suddenly he declared the following:

“Well really, relational databases are dead, nobody uses those any more.”

I stopped.  The first visual that popped into my head was of that time as a kid when one of your friends tell you that “nobody” does this or “everybody” does that.  But as you get older, you learn that your fellow six year olds don’t necessarily have a handle on what the whole world is doing.

I’m sure this guy was a smart guy.  And his company was certainly offering a valuable product.  But some perspective was in order.

So I said – really? The RDBMS is dead?  Let me ask you some questions. Don’t you all sell this product?

Him: Yes.

Me: Well – don’t you track sales in a database? Orders, items, linked to the catalog? You probably track orders in an RDBMS.

Him: Well … ok, yes, probably.

Me: And the customers placing those orders? You need data about any number of contact methods, like phone numbers, email addresses, where to ship, when to call, etc., all connected to the orders, sales records, etc.  Probably all structured for mass mailings by correct categories at the right time. Sounds like an RDBMS. Right?

Him: OK, probably, yes.

Me: So … financial data? Employee info and HR compliance, other regulatory issues? Is that going to work in a NoSQL environment?

Him: Good point, probably not.

Me: So the RDBMS is tracking company orders, items, products, customers, employees, suppliers, accounting, finance, human resources, marketing contacts, contracts with events like this conference, and all other data associated with dollars, the law, whatever must be accurately tracked in line with the rest. Even online blog posts are probably running on an RDBMS, right? So that leaves, what, web log files and online comments for marketing trend analysis?

Him: Well no, more than that … but I see your point.

Me;  NoSQL has its place.  So does SQL.  Both are valuable.

Him: Agreed.

One down.

Lots more to go.

– Steve

P.S.  A huge thank you to Steve Feuerstein for encouraging this particular blog post, which is a direct result of a little exchange we had recently on this topic.  Thanks, Steve!

The latest happenings at the Baltimore Washington Java Meetup

I just sent out an email to the illustrious members of my Java User Group (JUG), aka the Baltimore Washington Java Meetup.  The text is below.  All those interested in Java technology are invited and welcome!

Well well well!  What an AMAZING night we had for our November meetup!  I’ve been remiss to have not sent this sooner – things have been busy busy busy in Steve world – but you should know that November 10th was an amazing night.  We had about a billion people there – well, ok, perhaps it wasn’t a billion, but in all seriousness we maxed out the room – and it was all to hear the most excellent presentation by our very own rock star, Markus Dale, and his Overview of Amazon Web Services (AWS) and its Java API.  On hand were many illustrious members of our meetup, plus some additional celebrities, namely AWS’s Walt Whitecotton, John Peterson, and John Quarentello, who brought plenty of Amazon swag with them as giveaways and door prizes.

Speaking of the door prizes … the questions were:

1) Do you use (circle all that apply) (a) Ant, (b)Maven, (c)Jenkins, (d) Hudson?

Every answer was selected and there was a write-in or two for Gradle. But the hands-down leading choice was Maven.

This question was posed at JavaOne to a room of about 200 developers.  By a show of hands, about half used Jenkins, half Maven, and two people use Ant.

2) An annotation can be extended: (a) True (b) False

The correct answer is false! Now … should they be extendable?  Well that’s a different question altogether.  But in the meantime –

Congratulations to all the winners, especially Peter Stamboulis who won the 1st place price, a gift card for Amazon Web Services!

Most of all, a great big thank you and “bravo!” to Markus Dale, who clearly knows his stuff and did a fantastic edge-of-your-seat presentation on one of the hottest cutting-edge topics in the IT field today.

SO … our next meetup is … TONIGHT!  As we’ve been saying this fall, our December meetup would be non-standard, not a technical session, and that’s definitely the case this evening! We are simply meeting for dinner at a great restaurant in Columbia – pay your own way – click here for details.

In 2016 we’ll resume with technical sessions and already have a few in the works:  Lambdas, data streams, new features in Java 9, our own participation in the Adopt-A-JSR effort, and more!

But for tonight, we’re having a fun holiday dinner (pay your own way!), and everyone is invited!

See you tonight!

– Steve