Monday, September 12, 2022

SQL Series- Window Functions

 SQL Window Functions


The SQL window functions are also frequently used in everyday work by data scientists and data analysts, especially when querying particularly large datasets. Their usage can not only make the SQL code faster but also clearer and easier to understand by others. However, because of the complicated syntax, arguably less intuitive than with other SQL constructions, it takes practice to master using SQL window functions in queries.

The aggregate functions, you know that already, take values from multiple rows and return one value. Yes, this is data aggregation. The most popular aggregate functions are SUM(), COUNT(), AVG(), MIN(), and MAX().They are often used in the GROUP BY clause. That way, data can be aggregated on multiple columns, which extends the aggregate functions’ analytical possibilities. They still don’t retain the individual rows that are the basis for aggregation.

Formally speaking, window functions use values from multiple rows to produce values for each row separately and retain the individual rows. What distinguishes window from other SQL functions, namely aggregate and scalar functions, is the keyword OVER, used to define a portion of rows the function should consider as inputs to produce an output. This portion of rows is called a ‘window’, hence the name of this family of functions.

List of SQL Window Functions


There is no official division of the SQL window functions into categories but nevertheless, they are frequently being divided into two or three types. The most basic classification splits the SQL window functions into aggregate and built-in functions.

The main characteristic of the aggregate window functions is that they reuse the existing simple aggregate functions (such as COUNT() or SUM()) while changing the way in which the aggregation is defined and the format of the results. Meanwhile, the built-in functions have new names and cannot be used in another context.

The built-in SQL window functions are then sometimes divided into two different types: the ranking and value functions. The ranking window functions are used to simply assign numbers to the existing rows according to some requirements. On the other hand, the value window functions are used to copy values from other rows to other rows within the defined windows.

So they are broadly divided into three categories:

• Aggregate Window Functions

• Ranking Window Functions 

• Value Window Function


Let’s set up our approach to solving the SQL window functions interview questions and see how it can be applied to the questions testing your window functions knowledge.

How to Approach SQL Window Functions Interview Questions?

The only thing that makes up the ‘right’ approach to solving the questions is making it structured. All other things are less relevant. The approach will vary from person to person and will be based on how you think and what approach you feel comfortable with. The approach that is proven successful consists of the following steps.

 1. Exploring the dataset
 2. Identifying columns for solving the question 
 3. Writing out the SQL Query logic
 4. Querying

1. Exploring the Dataset

The interview questions usually have a concrete dataset you should
use to come up with a solution. The data is set up in a way that it tests certain SQL concepts. Always take some time to get to know the data types and inspect data for duplicates, NULL values, or missing values. This all influences your code and the functions you have to use. If there is more than one table, inspect how the tables are interconnected, how you could join them, and which JOIN you should use.

This is not always possible, but when it is, preview the data itself. It will help you get a clearer picture and maybe find something you could have missed by only looking at the column names and data types. 

2. Identifying Columns for Solving the Question

Less is not more, but it’s usually enough! Most questions will give you more data than you need to write a solution. Use this step to get rid of the columns you don’t need and write down the columns you do.

 3. Writing Out the SQL query Logic 

This literally means how it sounds: divide the code into logical blocks and write them down as steps. Usually, the ‘logic’ refers to the functions you’ll use. Number the steps, write down the function, and shortly describe how and why you’ll use it. You can also write a pseudo code if you want, which you will just fill in with the actual data from the interview question. As you’re writing the query logic, use this step to check with the interviewer to see if you’re moving in the right direction. Once you have the query logic written out, querying will be almost a technicality. 

4. Querying 

Most of the thinking was done in the previous steps. This will make query writing much easier because you’ll have most of the issues thought out already. Now you can focus on writing efficient query and have more bandwidth to tackle anything that comes up. 

Lets discuss each function in details on next posts




No comments:

Post a Comment

Spark- Window Function

  Window functions in Spark ================================================ -> Spark Window functions operate on a group of rows like pa...