SQL “GROUP BY” audience growing

This is a brief video (<10 minutes) I created a while back to present the technical concepts behind the SQL GROUP BY clause. It just surpassed 13,000 hits and has 70 thumbs up.

Python and SQL

Is it me? Or is there a lot of buzz about Python lately? Python is a great language for doing certain work quickly. It supports a variety of software design approaches. It is interpreted, and yet is object-oriented. You can use it like an interactive calculator, and it comes with powerful math libraries. It’s succinct – you can do a lot with fewer lines of code than is typical, but it can be used for large scale programming as well. For these reasons and more, Python is growing in popularity, particularly in the fields of data analytics and data science.

It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.

Getting Ready

I downloaded and used the following software:

Python	http://www.python.org	3.5.2	Popular software language!
SQLite	http://sqlite.org	3.15.2	Embedded server-less SQL database
Oracle	http://www.oracle.com	12c	The world’s leading relational database management system
cx_Oracle	http://cx-oracle.sourceforge.net/	5.2.1	Python extension module

I downloaded and installed Python and Oracle. I also downloaded SQLite and cx_Oracle for use with Python.

Python and SQLite

SQLite is an embedded SQL tool. It doesn’t have a separate server process, but is easily incorporated into applications. Here’s an example of Python code that uses SQLite, version 3:

import sqlite3
connection = sqlite3.connect("ships.db")
cursor = connection.cursor();
sql_statement = """CREATE TABLE ships
                  (    ship_id INTEGER
                     , ship_name TEXT);"""
cursor.execute(sql_statement)
sql_insert = """INSERT INTO ships VALUES(1, 'Codd One');"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships VALUES(2, 'Codd Two');"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships VALUES(3, 'Codd Three');"""
cursor.execute(sql_insert)
connection.commit()

The code above will successfully create a SQLite database file “ships.db”, then create table “ships” and then add three rows to it, and COMMIT the records.

In a subsequent script, you can query the same database.

import sqlite3
connection = sqlite3.connect("ships.db")
cursor = connection.cursor();
sql_statement = """SELECT ship_id, ship_name FROM ships;"""
cursor.execute(sql_statement)
print("fetchall:")
result = cursor.fetchall()
for r in result:
print(r)

The rows returned by the SELECT statement are processed within the “for” loop. “Result” captures the result of the query, and “r” serves as a fetched value for each row. In this case, each row represents an array of two values, and both values are displayed with the “print(r)” statement.

The output:

fetchall:
(1, 'Codd One')
(2, 'Codd Two')
(3, 'Codd Three')
>>>

Note: the “>>>” symbol is the Python cursor, indicating the interactive system is awaiting the next instruction.

Python and Oracle

Now let’s use Python to send SQL statements to Oracle 12c. To establish the connection, we use the Python extension module “cx_Module”, available from SourceForge.net – see above for the links to all the software necessary to run these scripts.

Also – note that while SQL strings submitted directly to Oracle require semi-colons, SQL statements sent via cx_Oracle reject them.

In this example, we’ll do the same thing we did with SQLite, with one change: instead of breaking up the CREATE/INSERT and SELECT statements into two scripts, we’ll do it all in one script:

import cx_Oracle
connection = cx_Oracle.connect('username/password@localhost/orcl')
print("Oracle RDBMS version " + connection.version)

cursor = connection.cursor()

sql_statement = """DROP TABLE ships2"""
cursor.execute(sql_statement)

sql_statement = """CREATE TABLE ships2 
                   (   ship_id NUMBER
                     , ship_name VARCHAR2(20))"""
cursor.execute(sql_statement)

sql_insert = """INSERT INTO ships2 VALUES(1, 'Codd Crystal')"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships2 VALUES(2, 'Codd Elegance')"""
cursor.execute(sql_insert)
sql_insert = """INSERT INTO ships2 VALUES(3, 'Codd Champion')"""
cursor.execute(sql_insert)

connection.commit()

cursor.execute("SELECT ship_id, ship_name FROM ships2")
print("fetchall:")
result = cursor.fetchall()
for r in result:
print(r)

connection.close()

print("End of script, database connection closed.");

The output of this script is the same we saw earlier:

fetchall:
(1, 'Codd One')
(2, 'Codd Two')
(3, 'Codd Three')
>>>

Regardless of which database we use, the results are the same.

Closing Thoughts

In this blog post, we looked at how to use Python to execute SQL statements. We created SQL objects, and sent data to the database, as well as queried data from the database. We passed data to and from Python structures and set t he stage for any data manipulation we wish to do using these powerful tools.

Python is easy to use, and capable of performing complex math functions and data manipulation. Using SQL to persist data for Python is an ideal combination for data analytics.

NOTE: This data and the accompanying scripts are available at my GitHub site: https://github.com/Skere9.

Number one again!

I was pleased to see that my SQL Expert book is yet again listed as a number one best seller this afternoon by Amazon.com’s book ranking system for two of the book’s categories:

Computers and Technology -> Databases and Big Data -> Oracle
Computers and Technology -> Certification -> Oracle

Trailing are excellent books in their own right, including many authored or co-authored by Oracle ACE Directors: a brand new Oracle Public Cloud book co-authored by Charles Kim, Mark Rittman‘s Business Intelligence book from Oracle Press, and Steve Feuerstein‘s legendary PL/SQL book.

I’m busy working on a new book from Oracle Press, more on that in the very near future – stay tuned! In the meantime, the 047 book makes a great holiday present for anyone in your life who wants to become a certified SQL Expert!

Below is the screenshot from the “Databases and Big Data” category:

Oracle SQL Expert Number One on the Amazon Best Selling Charts 11/29/2016 for its category.

A curious new book … ?

A new book from a major technical publisher …

Amazon Web Services (AWS) – System Log

Here’s an easy way to review the system log of an EC2 Instance using Amazon Web Service (AWS):

Navigate to the Management Console (the starting point of everything in AWS)
Click “EC2”
Click “Running Instances”
Choose the instance you’re interested in by clicking the empty square to its right and turning it solid blue
Click Actions -> Instance Settings -> Get System Log

See below for what your screen should look like.

The result will be a pop-up window displaying the system log:

That’s it!

GROUP BY on a GROUP BY

Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates. I can’t present the actual data I’m working with here, but this is an illustrative example. Consider the data set below.

SQL> SELECT   *
  2  FROM     shop_orders;

        ID CATEGORY   ITEM_N
---------- ---------- --------------------
         1 Office     Stapler
         2 Fruit      Banana
         3 Bread      French Bread
         4 Steak      Ribeye
         5 Fish       Bass
         6 Crackers   Saltines
         7 Fish       Swordfish
         8 Fruit      Pear
         9 Bread      Rolls
        10 Crackers   Chips
        11 Steak      Prime Rib
        12 Bakery     Birthday Cake
        13 Fish       Salmon
        14 Fish       Flounder
        15 Cleaners   Detergent
        16 Bakery     Donuts
        17 Office     Ink Cartridges
        18 Steak      Filet Mignon
        19 Vegetable  Broccoli
        20 Steak      Flank
        21 Fruit      Raspberry
        22 Sundries   Shampoo
        23 Steak      Salisbury
        24 Vegetable  Radish
        25 Vegetable  Mushroom
        26 Pharmacy   Band-aids
        27 Pharmacy   Aspirin
        28 Sundries   Toothbrush
        29 Sundries   Shampoo
        30 Cereal     Bran
        31 Cereal     Corn Flakes
        32 Cleaners   Soap
        33 Cereal     Oatmeal

33 rows selected.

One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?

To determine this, we can start with a GROUP BY, like this.

SQL> SELECT   category, COUNT(*) CT
  2  FROM     shop_orders
  3  GROUP BY category;

CATEGORY       CT
---------- ----------
Cereal              3
Fruit               3
Crackers            2
Vegetable           3
Office              2
Steak               5
Pharmacy            2
Fish                4
Bread               2
Cleaners            2
Sundries            3
Bakery              2

12 rows selected.

Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.

SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n
  2  FROM
  3    ( SELECT   a.category, COUNT(a.item_n) CT
  4  	 FROM	  shop_orders a
  5  	 GROUP BY a.category
  6    ) b
  7  GROUP BY b.ct
  8  ORDER BY b.ct DESC;

CT_CATEGORY  CT_ITEM_N
----------- ----------
          1          5
          1          4
          4          3
          6          2

Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.

In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.

If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.

DROP   TABLE shop_orders;
CREATE TABLE shop_orders
(   id       NUMBER
  , category VARCHAR2(15)
  , item_n   VARCHAR2(20)
);
INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler');
INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana');
INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread');
INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye');
INSERT INTO shop_orders VALUES ( 5, 'Fish',  'Bass');
INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines');
INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish');
INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear');
INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls');
INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips');
INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib');
INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake');
INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon');
INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder');
INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent');
INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts');
INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges');
INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon');
INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli');
INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank');
INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry');
INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury');
INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish');
INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom');
INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids');
INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin');
INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush');
INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo');
INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran');
INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes');
INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap');
INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal');

COMMIT;

There you go! Have fun!

Generating Test Data

Here’s a quick way to generate dummy data for performing tests that require lots of records.

SQL> DROP   TABLE lots_o_data;

Table dropped.

SQL> CREATE TABLE lots_o_data
  2  (	 id	  NUMBER
  3    , name	  VARCHAR2(5)
  4    , category VARCHAR2(26)
  5  );

Table created.

SQL> INSERT INTO lots_o_data VALUES (1, 'Lotso', 'Wow this is a lot of data.');

1 row created.

SQL> INSERT INTO lots_o_data VALUES (2, 'More',  'Bunches and bunches.');

1 row created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

4 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

36 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

1764 rows created.

SQL> COMMIT;

Commit complete.

SQL> INSERT INTO lots_o_data
  2  	 SELECT a.id, a.name, b.category
  3  	 FROM lots_o_data a CROSS JOIN lots_o_data b;

3261636 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT COUNT(*)
  2  FROM   lots_o_data;
  COUNT(*)                                                               ----------                                                             3263442                                                                      
SQL> SPOOL OFF

Note the following

With only four INSERT … SELECT statements based on a Cartesian product, we were able to generate over 3 million rows of data.
A few more INSERT … SELECT statements could easily result in billions of rows of data.

If you need a lot of data for performance testing or capacity testing, this is an easy way to get however much you require.

I have been working on building a data lake environment in anticipation of a series of large data feeds I’m expecting soon and found this to be a useful approach to preparing the environment.

Too much is happening

Ok, so listen … there’s no time to waste. Things are happening way too fast.

First, this site is about a bunch of people I know – family members, friends, and acquaintances of mine – who are outrageously talented in the most amazingly diverse ways imaginable. I want to brag about them all. And I will. And I guess that’s all you need to know for now.

More later as we get going. And I’m telling you that now so I can blab out the next post …

Hammer, hammer, saw, saw ….

Nail, nail, nail, nail. Gerhunk. WHAM-O! Click.

The Skere9 Wiki is live

The Skere9 wiki is now alive and well – for more information visit the Wiki page.