SQL on Jupyter Notebook

Running SQL Queries on Jupyter Notebook

For far too long, many people have based their interest only on the concept of learning Excel as the only essential technical skill for data analytics, but there's more to writing Excel Functions and creating conditional tables; there is the world of SQL.

What is SQL?

SQL is the acronym for Structured Query Language; it operates through declarative statements and creates new databases, queries them, and maintains the data we have in them.

SQL has many functions applied in databases, most especially in relational databases stretching from its data can be accessed directly, it is easy to understand. The most important ones such as it perform complex calculations, and it's ability to query large datasets much more than Excel.

In SQL, we have SQL-based software such as MySQL and PostgreSQL, where one can create a database, alter it, and perform various queries on the same database.

What can you do with SQL?

The possibilities are endless. SQL can efficiently work with any database system to communicate and manage large amounts of data. This goes beyond basic spreadsheets like Excel,, which are limited in how much data they can efficiently sort. In contrast, SQL can work with millions of pieces of data. This database language is imperative for data analysis and data-based machine learning. Uniquely, SQL can handle data in tables and allows data to be both compiled and manipulated. This makes data analysis more efficient. Since it is query-based, SQL can easily collect data based on one categorical need or from a query. For example, a company can easily use SQL-programmed databases to run a report of how many of its customers spent $1000 or more on their products. SQL can perform various aggregate functions such as averages, standard deviations, counts, etc. SQL also generates descriptive statistics such as mean, median, and mode. Since most database management systems support SQL, you can quickly move among different database management systems after learning to program. SQL can also be embedded in other programming languages, making it a great first language to learn.

SQL is easy to learn.

SQL is a natural language that feels natural to write because it is similar to English. It is also a free, open-source language, so anyone can access it and learn from free resources. SQL can also be known through intensive boot camps in various settings and flexible schedule options, so a degree is not needed to start using SQL in the workforce language is considered a "declarative language." Unlike imperative languages like Java, SQL demands the data required, and the system determines how to deliver the request. Less coding is necessary, making the SQL programming process simpler for most users to both perform and read the programming language. Learnsql.com boasts that SQL hides all of the data processing from the user, making the process of gathering and analyzing data; as a result, programming with SQL is less congested than with other languages. The natural flow and straightforward writing make programming with SQL friendlier to use than other languages.

What is a Jupyter Notebook?

Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and text. Most data analysts use Jupyter Notebook as their analytical notebook and use programming languages such as Python, R, and Julia.

Jupyter Notebook also has functionalities for data analytics/science, such as:

1. Data Organization and Cleaning

One of the most tedious tasks for a data scientist is the work of cleaning and organizing data. A time-consuming process, data cleaning requires that your data be organized so that the portions of the dataset you do not want can be easily identified and removed before you can begin using the data to make suggestions or predictions. Since the Jupyter Notebook is compatible with multiple tools, it can streamline this data organization and cleaning process.

2. Data Visualization and Sharing

Once you have cleaned, organized, and analyzed your data, Jupyter Notebook can visualize and share data. Jupyter Notebook allows users to compile all aspects of a data project in one place, making it easier to show the entire project process to your intended audience. Through the web-based application, users can create data visualizations and other components of a project to share with others via the platform. There are multiple ways that you can share a data project both within and outside of Jupyter Notebook.

3. Teaching Data Science Skills

Based on all of the previous uses for Jupyter Notebook, it is apparent that Jupyter Notebook is primarily used to teach data science skills. It is common to see Jupyter Notebooks used within the classroom because it allows students to collaborate on data science and computer programming projects. In addition, it can be used to practice important data science and programming skills, such as data analysis and organization. Through the sharing and visualization component, Jupyter Notebook is an excellent tool for having students show their work on a project from start to finish, as well as learning how to identify and understand where they went right or wrong when writing code.

Now the problem most data analysts have had with Jupyter is not knowing the necessary library that will aid in running SQL queries in the notebook format; Here, I present sqlite3(fugue_sql)

How to run SQL Queries on Jupyter Notebook?

Many people have fallen in love with Python for data analysis/science and love the Jupyter environment but have a problem with the proficiency of running SQL queries on Jupyter.

To run SQL queries on datasets; there are a couple of ways to do that, either by using BigQuery or fugue_sql.

But one of the easiest ways to achieve this is using pandasql.

Steps to get it running:

  • Use either:
pip install -U pandasql OR pip install -U pandasql
  • Import the Library
from pandasql import sqldf

SELECT Column FROM dataset WHERE this = 0

To print results

print(sqldf(q,locals())) OR print(sqldf(q,global))

The Pros and Cons of fugue_sql

As much as it's practical, it does have its pros and cons.

Pros:

  • It allows you to run queries on your dataset.

  • You can also use pandas functions such as .head() or .describe().

Cons:

  • It only allows you to run three basic queries; that is, it doesn't support your Advanced SQL queries.

Conclusion

These have indeed been an eye opener into the world of running SQL queries on Jupyter Notebook; there are still other ways of achieving this, such as using Google Big Query, but this has been found easier to understand and practice in one's dataset. You can use Python codes(Pandas) on the result of your SQL queries.