This is a brief video (<10 minutes) I created a while back to present the technical concepts behind the SQL GROUP BY clause. It just surpassed 13,000 hits and has 70 thumbs up.
Author: Skere9Admin
Python and SQL
Is it me? Or is there a lot of buzz about Python lately? Python is a great language for doing certain work quickly. It supports a variety of software design approaches. It is interpreted, and yet is object-oriented. You can use it like an interactive calculator, and it comes with powerful math libraries. It’s succinct – you can do a lot with fewer lines of code than is typical, but it can be used for large scale programming as well. For these reasons and more, Python is growing in popularity, particularly in the fields of data analytics and data science.
It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.
It makes sense that it works well with SQL to provide data persistence. This blog post shows you how you can leverage existing SQL data stores with the power of Python.
Getting Ready
I downloaded and used the following software:
Python | http://www.python.org | 3.5.2 | Popular software language! |
SQLite | http://sqlite.org | 3.15.2 | Embedded server-less SQL database |
Oracle | http://www.oracle.com | 12c | The world’s leading relational database management system |
cx_Oracle | http://cx-oracle.sourceforge.net/ | 5.2.1 | Python extension module |
I downloaded and installed Python and Oracle. I also downloaded SQLite and cx_Oracle for use with Python.
Python and SQLite
SQLite is an embedded SQL tool. It doesn’t have a separate server process, but is easily incorporated into applications. Here’s an example of Python code that uses SQLite, version 3:
import sqlite3 connection = sqlite3.connect("ships.db") cursor = connection.cursor(); sql_statement = """CREATE TABLE ships ( ship_id INTEGER , ship_name TEXT);""" cursor.execute(sql_statement) sql_insert = """INSERT INTO ships VALUES(1, 'Codd One');""" cursor.execute(sql_insert) sql_insert = """INSERT INTO ships VALUES(2, 'Codd Two');""" cursor.execute(sql_insert) sql_insert = """INSERT INTO ships VALUES(3, 'Codd Three');""" cursor.execute(sql_insert) connection.commit()
The code above will successfully create a SQLite database file “ships.db”, then create table “ships” and then add three rows to it, and COMMIT the records.
In a subsequent script, you can query the same database.
import sqlite3 connection = sqlite3.connect("ships.db") cursor = connection.cursor(); sql_statement = """SELECT ship_id, ship_name FROM ships;""" cursor.execute(sql_statement) print("fetchall:") result = cursor.fetchall() for r in result: print(r)
The rows returned by the SELECT statement are processed within the “for” loop. “Result” captures the result of the query, and “r” serves as a fetched value for each row. In this case, each row represents an array of two values, and both values are displayed with the “print(r)” statement.
The output:
fetchall: (1, 'Codd One') (2, 'Codd Two') (3, 'Codd Three') >>>
Note: the “>>>” symbol is the Python cursor, indicating the interactive system is awaiting the next instruction.
Python and Oracle
Now let’s use Python to send SQL statements to Oracle 12c. To establish the connection, we use the Python extension module “cx_Module”, available from SourceForge.net – see above for the links to all the software necessary to run these scripts.
Also – note that while SQL strings submitted directly to Oracle require semi-colons, SQL statements sent via cx_Oracle reject them.
In this example, we’ll do the same thing we did with SQLite, with one change: instead of breaking up the CREATE/INSERT and SELECT statements into two scripts, we’ll do it all in one script:
import cx_Oracle connection = cx_Oracle.connect('username/password@localhost/orcl') print("Oracle RDBMS version " + connection.version) cursor = connection.cursor() sql_statement = """DROP TABLE ships2""" cursor.execute(sql_statement) sql_statement = """CREATE TABLE ships2 ( ship_id NUMBER , ship_name VARCHAR2(20))""" cursor.execute(sql_statement) sql_insert = """INSERT INTO ships2 VALUES(1, 'Codd Crystal')""" cursor.execute(sql_insert) sql_insert = """INSERT INTO ships2 VALUES(2, 'Codd Elegance')""" cursor.execute(sql_insert) sql_insert = """INSERT INTO ships2 VALUES(3, 'Codd Champion')""" cursor.execute(sql_insert) connection.commit() cursor.execute("SELECT ship_id, ship_name FROM ships2") print("fetchall:") result = cursor.fetchall() for r in result: print(r) connection.close() print("End of script, database connection closed.");
The output of this script is the same we saw earlier:
fetchall: (1, 'Codd One') (2, 'Codd Two') (3, 'Codd Three') >>>
Regardless of which database we use, the results are the same.
Closing Thoughts
In this blog post, we looked at how to use Python to execute SQL statements. We created SQL objects, and sent data to the database, as well as queried data from the database. We passed data to and from Python structures and set t he stage for any data manipulation we wish to do using these powerful tools.
Python is easy to use, and capable of performing complex math functions and data manipulation. Using SQL to persist data for Python is an ideal combination for data analytics.
NOTE: This data and the accompanying scripts are available at my GitHub site: https://github.com/Skere9.
Number one again!
I was pleased to see that my SQL Expert book is yet again listed as a number one best seller this afternoon by Amazon.com’s book ranking system for two of the book’s categories:
- Computers and Technology -> Databases and Big Data -> Oracle
- Computers and Technology -> Certification -> Oracle
Trailing are excellent books in their own right, including many authored or co-authored by Oracle ACE Directors: a brand new Oracle Public Cloud book co-authored by Charles Kim, Mark Rittman‘s Business Intelligence book from Oracle Press, and Steve Feuerstein‘s legendary PL/SQL book.
I’m busy working on a new book from Oracle Press, more on that in the very near future – stay tuned! In the meantime, the 047 book makes a great holiday present for anyone in your life who wants to become a certified SQL Expert!
Below is the screenshot from the “Databases and Big Data” category:
A curious new book … ?
A new book from a major technical publisher …
Amazon Web Services (AWS) – System Log
Here’s an easy way to review the system log of an EC2 Instance using Amazon Web Service (AWS):
- Navigate to the Management Console (the starting point of everything in AWS)
- Click “EC2”
- Click “Running Instances”
- Choose the instance you’re interested in by clicking the empty square to its right and turning it solid blue
- Click Actions -> Instance Settings -> Get System Log
See below for what your screen should look like.
The result will be a pop-up window displaying the system log:
That’s it!
GROUP BY on a GROUP BY
Here’s something I’ve been working on lately – a number of metrics across large data sets where I want to determine aggregates of aggregates. I can’t present the actual data I’m working with here, but this is an illustrative example. Consider the data set below.
SQL> SELECT * 2 FROM shop_orders; ID CATEGORY ITEM_N ---------- ---------- -------------------- 1 Office Stapler 2 Fruit Banana 3 Bread French Bread 4 Steak Ribeye 5 Fish Bass 6 Crackers Saltines 7 Fish Swordfish 8 Fruit Pear 9 Bread Rolls 10 Crackers Chips 11 Steak Prime Rib 12 Bakery Birthday Cake 13 Fish Salmon 14 Fish Flounder 15 Cleaners Detergent 16 Bakery Donuts 17 Office Ink Cartridges 18 Steak Filet Mignon 19 Vegetable Broccoli 20 Steak Flank 21 Fruit Raspberry 22 Sundries Shampoo 23 Steak Salisbury 24 Vegetable Radish 25 Vegetable Mushroom 26 Pharmacy Band-aids 27 Pharmacy Aspirin 28 Sundries Toothbrush 29 Sundries Shampoo 30 Cereal Bran 31 Cereal Corn Flakes 32 Cleaners Soap 33 Cereal Oatmeal 33 rows selected.
One glance at the data and you can see the category values repeat. But how often do they repeat? Which categories recur the most frequently? What if you needed to determine which categories have the greatest number of values, and which don’t?
To determine this, we can start with a GROUP BY, like this.
SQL> SELECT category, COUNT(*) CT 2 FROM shop_orders 3 GROUP BY category; CATEGORY CT ---------- ---------- Cereal 3 Fruit 3 Crackers 2 Vegetable 3 Office 2 Steak 5 Pharmacy 2 Fish 4 Bread 2 Cleaners 2 Sundries 3 Bakery 2 12 rows selected.
Now let’s move the GROUP BY to an inline view, and do a GROUP on the GROUP BY, like this.
SQL> SELECT COUNT(b.category) ct_category, b.ct ct_item_n 2 FROM 3 ( SELECT a.category, COUNT(a.item_n) CT 4 FROM shop_orders a 5 GROUP BY a.category 6 ) b 7 GROUP BY b.ct 8 ORDER BY b.ct DESC; CT_CATEGORY CT_ITEM_N ----------- ---------- 1 5 1 4 4 3 6 2
Voila – there is one category that recurs five times, one recurs 4 times, four recur three times, and finally six categories have two values each.
In a relatively small data set like this, the answers might be obvious. I’ve been working with very large rows that number in the tens of millions and queries like this have been invaluable in confirming data shape and patterns.
If you wish to experiment with these SQL statements, here is the code to create the sample table and sample data.
DROP TABLE shop_orders; CREATE TABLE shop_orders ( id NUMBER , category VARCHAR2(15) , item_n VARCHAR2(20) ); INSERT INTO shop_orders VALUES ( 1, 'Office', 'Stapler'); INSERT INTO shop_orders VALUES ( 2, 'Fruit', 'Banana'); INSERT INTO shop_orders VALUES ( 3, 'Bread', 'French Bread'); INSERT INTO shop_orders VALUES ( 4, 'Steak', 'Ribeye'); INSERT INTO shop_orders VALUES ( 5, 'Fish', 'Bass'); INSERT INTO shop_orders VALUES ( 6, 'Crackers', 'Saltines'); INSERT INTO shop_orders VALUES ( 7, 'Fish', 'Swordfish'); INSERT INTO shop_orders VALUES ( 8, 'Fruit', 'Pear'); INSERT INTO shop_orders VALUES ( 9, 'Bread', 'Rolls'); INSERT INTO shop_orders VALUES (10, 'Crackers', 'Chips'); INSERT INTO shop_orders VALUES (11, 'Steak', 'Prime Rib'); INSERT INTO shop_orders VALUES (12, 'Bakery', 'Birthday Cake'); INSERT INTO shop_orders VALUES (13, 'Fish', 'Salmon'); INSERT INTO shop_orders VALUES (14, 'Fish', 'Flounder'); INSERT INTO shop_orders VALUES (15, 'Cleaners', 'Detergent'); INSERT INTO shop_orders VALUES (16, 'Bakery', 'Donuts'); INSERT INTO shop_orders VALUES (17, 'Office', 'Ink Cartridges'); INSERT INTO shop_orders VALUES (18, 'Steak', 'Filet Mignon'); INSERT INTO shop_orders VALUES (19, 'Vegetable', 'Broccoli'); INSERT INTO shop_orders VALUES (20, 'Steak', 'Flank'); INSERT INTO shop_orders VALUES (21, 'Fruit', 'Raspberry'); INSERT INTO shop_orders VALUES (22, 'Sundries', 'Shampoo'); INSERT INTO shop_orders VALUES (23, 'Steak', 'Salisbury'); INSERT INTO shop_orders VALUES (24, 'Vegetable', 'Radish'); INSERT INTO shop_orders VALUES (25, 'Vegetable', 'Mushroom'); INSERT INTO shop_orders VALUES (26, 'Pharmacy', 'Band-aids'); INSERT INTO shop_orders VALUES (27, 'Pharmacy', 'Aspirin'); INSERT INTO shop_orders VALUES (28, 'Sundries', 'Toothbrush'); INSERT INTO shop_orders VALUES (29, 'Sundries', 'Shampoo'); INSERT INTO shop_orders VALUES (30, 'Cereal', 'Bran'); INSERT INTO shop_orders VALUES (31, 'Cereal', 'Corn Flakes'); INSERT INTO shop_orders VALUES (32, 'Cleaners', 'Soap'); INSERT INTO shop_orders VALUES (33, 'Cereal', 'Oatmeal'); COMMIT;
There you go! Have fun!
Generating Test Data
Here’s a quick way to generate dummy data for performing tests that require lots of records.
SQL> DROP TABLE lots_o_data; Table dropped. SQL> CREATE TABLE lots_o_data 2 ( id NUMBER 3 , name VARCHAR2(5) 4 , category VARCHAR2(26) 5 ); Table created. SQL> INSERT INTO lots_o_data VALUES (1, 'Lotso', 'Wow this is a lot of data.'); 1 row created. SQL> INSERT INTO lots_o_data VALUES (2, 'More', 'Bunches and bunches.'); 1 row created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 4 rows created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 36 rows created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 1764 rows created. SQL> COMMIT; Commit complete. SQL> INSERT INTO lots_o_data 2 SELECT a.id, a.name, b.category 3 FROM lots_o_data a CROSS JOIN lots_o_data b; 3261636 rows created. SQL> COMMIT; Commit complete. SQL> SELECT COUNT(*) 2 FROM lots_o_data; COUNT(*) ---------- 3263442 SQL> SPOOL OFF
Note the following
- With only four INSERT … SELECT statements based on a Cartesian product, we were able to generate over 3 million rows of data.
- A few more INSERT … SELECT statements could easily result in billions of rows of data.
If you need a lot of data for performance testing or capacity testing, this is an easy way to get however much you require.
I have been working on building a data lake environment in anticipation of a series of large data feeds I’m expecting soon and found this to be a useful approach to preparing the environment.
Too much is happening
Ok, so listen … there’s no time to waste. Things are happening way too fast.
First, this site is about a bunch of people I know – family members, friends, and acquaintances of mine – who are outrageously talented in the most amazingly diverse ways imaginable. I want to brag about them all. And I will. And I guess that’s all you need to know for now.
More later as we get going. And I’m telling you that now so I can blab out the next post …
Hammer, hammer, saw, saw ….
Nail, nail, nail, nail. Gerhunk. WHAM-O! Click.
The Skere9 Wiki is live
The Skere9 wiki is now alive and well – for more information visit the Wiki page.