For Which situation would you use a group function?

The GROUP BY Statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has same values in different rows then it will arrange these rows in a group. Important Points:

  • GROUP BY clause is used with the SELECT statement.
  • In the query, GROUP BY clause is placed after the WHERE clause.
  • In the query, GROUP BY clause is placed before ORDER BY clause if used any.
  • In the query , Group BY clause is placed before Having clause .
  • Place condition in the having clause 

Syntax:

SELECT column1, function_name(column2) FROM table_name WHERE condition GROUP BY column1, column2 ORDER BY column1, column2; function_name: Name of the function used for example, SUM() , AVG(). table_name: Name of the table. condition: Condition used.

Sample Table is as follows:

Employee

For Which situation would you use a group function?

Student

For Which situation would you use a group function?

Example:

Group By single column: Group By single column means, to place all the rows with same value of only that particular column in one group. Consider the query as shown below:

SELECT NAME, SUM(SALARY) FROM Employee GROUP BY NAME;

The above query will produce the below output: 

For Which situation would you use a group function?

As you can see in the above output, the rows with duplicate NAMEs are grouped under same NAME and their corresponding SALARY is the sum of the SALARY of duplicate rows. The SUM() function of SQL is used here to calculate the sum.

Group By multiple columns: Group by multiple column is say for example, GROUP BY column1, column2. This means to place all the rows with same values of both the columns column1 and column2 in one group. Consider the below query:

SELECT SUBJECT, YEAR, Count(*) FROM Student GROUP BY SUBJECT, YEAR;

For Which situation would you use a group function?

Output: As you can see in the above output the students with both same SUBJECT and YEAR are placed in same group. And those whose only SUBJECT is same but not YEAR belong to different groups. So here we have grouped the table according to two columns or more than one column.

HAVING Clause

We know that WHERE clause is used to place conditions on columns but what if we want to place conditions on groups? This is where HAVING clause comes into use. We can use HAVING clause to place conditions to decide which group will be the part of final result-set. Also we can not use the aggregate functions like SUM(), COUNT() etc. with WHERE clause. So we have to use HAVING clause if we want to use any of these functions in the conditions. 

Syntax:

SELECT column1, function_name(column2) FROM table_name WHERE condition GROUP BY column1, column2 HAVING condition ORDER BY column1, column2; function_name: Name of the function used for example, SUM() , AVG(). table_name: Name of the table. condition: Condition used.

Example:

SELECT NAME, SUM(SALARY) FROM Employee GROUP BY NAME HAVING SUM(SALARY)>3000;

Output:

For Which situation would you use a group function?
 

 As you can see in the above output only one group out of the three groups appears in the result-set as it is the only group where sum of SALARY is greater than 3000. So we have used HAVING clause here to place this condition as the condition is required to be placed on groups not columns.

This article is contributed by Harsh Agarwal. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to . See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or if you want to share more information about the topic discussed above.

If you are already familiar with the HAVING clause, you may find yourself in a situation and not know whether to use WHERE or HAVING. You may even be asking yourself – if we can use WHERE, why does the HAVING clause even exist?

The HAVING clause is frequently implemented with GROUP BY because it refines the output from records that do not satisfy a certain condition.

Internalizing the corresponding syntax will help us explain the difference between the two keywords, so let’s begin there.

HAVING needs to be inserted between the GROUP BY and ORDER BY clauses. In addition, HAVING is like WHERE but applied to the GROUP BY block.

The difference can be better understood with an example.

Side note: We will be working with the ‘employees’ database, so if you haven’t downloaded it, make sure to check out our tutorial explaining it. 

Providing a Case in Point

First, on some occasions, an identical result could be obtained by implementing the same condition, either with the WHERE or HAVING clause.

For instance, should we select all employees hired after the 1st of January 2000, the retrieved table will be the same, whether we are using WHERE or HAVING!

If we are using WHERE, the code will be:

SELECT * FROM employees WHERE hire_date > ‘2000-01-01’;

And if we decide to use the HAVING clause, we can simply replace WHERE:

SELECT * FROM employees HAVING hire_date > ‘2000-01-01’;

The results are the same, as shown in the picture below.

For Which situation would you use a group function?

The Main Advantage of the HAVING Clause over the WHERE Clause

Well, the main distinction between the two clauses is that HAVING can be applied for subsets of aggregated groups, while in the WHERE block, this is forbidden. In simpler words, after HAVING, we can have a condition with an aggregate function, while WHERE cannot use aggregate functions within its conditions.

An Example

Assume we want to extract a list with all first names that appear more than 250 times in the “employees” table.

If we try to set this condition in the WHERE clause, Workbench wouldn’t indicate there is a mistake in our code, because this is the correct syntax.

We will be shown an error message when we try to execute the query. And it will be a very eloquent one: invalid use of group function.

For Which situation would you use a group function?

Fixing the Error

We can change the keyword to HAVING and add the line of code in the right place, just after the GROUP BY statement.

SELECT first_name, COUNT(first_name) as names_count FROM employees GROUP BY first_name HAVING COUNT(first_name) > 250 ORDER BY first_name;

Now, let’s re-run the query.

For Which situation would you use a group function?

We retrieved 193 names that can be encountered more than 250 times in the “employees” table.

WHERE or HAVING?

So, when is the right time to use WHERE and when should you use HAVING?

It is not so hard to decide. Let’s go over the problem we just solved again, “Extract all first names that appear more than 250 times in the “employees” table”. You must first spot the phrase “250 times”. It leads to counting the number of times something appears in the data table. And counting numbers is executed through the COUNT() function, isn’t it?

COUNT() is an aggregate function.

You must use HAVING, not WHERE.

For Which situation would you use a group function?

It is as simple as that. The same logic must be applied anytime an aggregate function is required for the solution of your task.

Summary of WHERE and HAVING

It is important to decide whether to use WHERE or HAVING in certain situations

WHERE allows us to set conditions that refer to subsets of individual rows. These conditions are applied before re-organizing the output into groups.

Once the rows that satisfy the WHERE conditions are chosen, they progress in the data retrieval process and can be grouped by distinct values recorded in a certain field or fields.

For Which situation would you use a group function?

It is not until this moment, when the output can be further improved, or filtered, with a condition specified in the HAVING clause.

Finally, you could sort the records of the final list through the ORDER BY clause.

For Which situation would you use a group function?

Solving a Task

To reinforce your understanding of the data retrieval process, let’s see an example containing both a WHERE and a HAVING condition.

The task is to extract a list of all names that are encountered less than 200 times. Let the data refer to people hired after the 1st of January 1999 only.

Let’s create the query, step-by-step.

Apparently, we must select the first names and the number of times a first name is encountered, renaming the second selection as “names_count”.

The second thing to do is designate the table we will retrieve data from – “employees”.

The code will look like this:

SELECT first_name, COUNT(first_name) AS names_count FROM employees;

What to Use

For Which situation would you use a group function?

Should we only use WHERE or HAVING, or both keywords, while setting our conditions?

Well, there are two conditions to satisfy to solve our problem.

First Condition

One is that the names must be encountered less than 200 times. “200 times” immediately means we must use COUNT(), which will count the number of times a certain first name appears in the “employees” data table.

COUNT() is an aggregate function and, as we said earlier, so it must go with HAVING.

For Which situation would you use a group function?

Second Condition

The other condition to satisfy is general: “All the rows extracted must be of people who were hired after the 1st of January 1999.” This condition refers to all individual rows in the “employees” table. No specific aggregate function must be applied. Therefore, this condition must go with the WHERE clause.

For Which situation would you use a group function?

Important: Between the WHERE and the HAVING blocks, we shouldn’t forget to insert the GROUP BY segment. We must group by “first_name”, not by some other field, since our task requires us to aggregate our output by the number of times a certain first name is encountered.

Actually, let’s order the output by the “first_name” in descending order.

The Final Solution

The whole query can be written in the following way:

SELECT first_name, COUNT(first_name) AS names_count FROM employees WHERE hire_date > ‘1999-01-01’ GROUP BY first_name HAVING COUNT(first_name) < 200 ORDER BY first_name DESC;

For Which situation would you use a group function?

The query worked and solved our problem – we have a list with the number of all distinct names of people hired after the 1st of January 1999.

A Few Observations

The function “COUNT(first_name)” has been applied twice – once in the SELECT statement and once in the HAVING block.

Multiple Use

On the screen, the two phrases seem distant from one another. However, in terms of logic, they are close. Normally, you would like the aggregate function you have attached to a specific condition to be seen in your output, too. So, it makes sense to see the same phrase twice.

For Which situation would you use a group function?

On the other hand, “first_name” appears in the query three times. Once, because you group by this field. A second time because grouping by a field requires stating the column name in the SELECT block. And the third time, at the bottom of the query, because we order the data by the same selection.

For Which situation would you use a group function?

Looking out for the Order

It might seem you are using “first_name” and COUNT() too often within the query. Well, this is fine. This is not always going to be the case, but it is okay if it happens. Therefore, you must master the query structure well; otherwise, you will easily fall into traps, inducing you to make mistakes while coding.

For Which situation would you use a group function?

Another Peculiarity of the HAVING Keyword

In addition, as an exercise, you could try to place the “hire_date” condition in the HAVING clause, instead of in the WHERE clause, but you’ll get an error.

For Which situation would you use a group function?

You saw HAVING can contain such a condition as “hire_date” greater than the 1st of January 2000.” In another query, it worked properly with an aggregate function:

For Which situation would you use a group function?

Important: Both situations are feasible for MySQL, but you cannot have both an aggregated and a non-aggregated condition in the HAVING clause! Although this seems to be a minor detail, sometimes, it will be important for you to know this. Don’t mix the conditions in the HAVING block. This is a pitfall you should be aware of, and that’s why we shared it with you!

For Which situation would you use a group function?

When to Use the HAVING Clause

To sum up this tutorial, remember the simple rule:

If you need to work with aggregate functions, use GROUP BY and HAVING. And if you need to apply general conditions, use WHERE.

You will agree it seems there are too many clauses and tricks to remember about the SELECT statement, right? SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and their corresponding features.

Don’t worry – as you read more tutorials and practice in the exercises attached to the lessons, you will start using these statements of the data manipulation language freely.

So, it may come to you as no surprise that there are other powerful statements in SQL. Feel free to dive into the unknown world of the LIMIT statement.

***

Eager to hone your SQL skills? Learn how to put theory into practice with our hands-on tutorials!

Next Tutorial: The LIMIT Statement

What is the use of group function?

Use the group functions with attribute groups that return multiple rows of data or a single-row attribute group that has been configured for historical data collection. With the exception of COUNT, which can be used by all attribute types, group functions are available only for numeric attributes.

What are some example of group functions?

The types of group functions (also called aggregate functions) are:.
AVG, that calculates the average of the specified columns in a set of rows,.
COUNT, calculating the number of rows in a set..
MAX, calculating the maximum,.
MIN, calculating the minimum,.
STDDEV, calculating the standard deviation,.
SUM, calculating the sum,.

What is the use of GROUP BY functions in SQL?

The SQL GROUP BY Statement The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.

Which of the following can be used only with group functions?

The functions MAX, MIN and AVG can be used as GROUP BY functions.