GROUP BY, HAVING, ORDER BY
This page covers aggregation and sorting in loq SQL.
ORDER BY
Sort results by one or more columns.
Basic Sorting
-- Ascending (default)
SELECT * FROM users.csv ORDER BY name
-- Explicit ascending
SELECT * FROM users.csv ORDER BY name ASC
-- Descending
SELECT * FROM users.csv ORDER BY age DESCMultiple Columns
Sort by multiple columns (first column takes priority):
-- Sort by city, then by name within each city
SELECT * FROM users.csv ORDER BY city ASC, name ASC
-- Sort by department, then salary descending within each
SELECT * FROM employees.csv ORDER BY department, salary DESCSorting by Expressions
Sort by calculated values:
SELECT name, price, quantity, price * quantity AS total
FROM orders.csv
ORDER BY price * quantity DESCSorting by Alias
Use column aliases in ORDER BY:
SELECT name, price * quantity AS total
FROM orders.csv
ORDER BY total DESCSorting by Position
Reference columns by position (1-based):
SELECT name, age, city FROM users.csv
ORDER BY 2 DESC -- Sort by age (column 2)NULL Ordering
NULL values sort first in ascending, last in descending:
-- NULLs first
SELECT * FROM users.csv ORDER BY email ASC
-- NULLs last
SELECT * FROM users.csv ORDER BY email DESCAggregate Functions
Aggregate functions compute a single value from multiple rows.
COUNT
Count rows:
-- Count all rows
SELECT COUNT(*) FROM users.csv
-- Count non-NULL values
SELECT COUNT(email) FROM users.csv
-- Count with alias
SELECT COUNT(*) AS total_users FROM users.csvSUM
Sum numeric values:
SELECT SUM(amount) AS total_sales FROM orders.csv
-- Sum with condition
SELECT SUM(amount) AS total
FROM orders.csv
WHERE status = 'completed'AVG
Calculate average:
SELECT AVG(salary) AS avg_salary FROM employees.csv
-- Rounded average
SELECT ROUND(AVG(salary), 2) AS avg_salary FROM employees.csvMIN / MAX
Find minimum or maximum:
SELECT
MIN(price) AS lowest_price,
MAX(price) AS highest_price
FROM products.csv
-- With dates
SELECT
MIN(created_date) AS first_order,
MAX(created_date) AS last_order
FROM orders.csvGROUP_CONCAT
Concatenate values from multiple rows:
SELECT department, GROUP_CONCAT(name) AS employees
FROM employees.csv
GROUP BY departmentOutput:
department,employees
Engineering,Alice,Bob,Carol
Marketing,David,EveMultiple Aggregates
Combine multiple aggregate functions:
SELECT
COUNT(*) AS total,
SUM(amount) AS sum,
AVG(amount) AS avg,
MIN(amount) AS min,
MAX(amount) AS max
FROM orders.csvGROUP BY
Group rows with the same values in specified columns.
Basic Grouping
SELECT city, COUNT(*) AS count
FROM users.csv
GROUP BY cityOutput:
city,count
New York,150
Chicago,89
Los Angeles,120Multiple Grouping Columns
SELECT city, status, COUNT(*) AS count
FROM users.csv
GROUP BY city, statusGrouping with Aggregates
SELECT
department,
COUNT(*) AS employees,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees.csv
GROUP BY departmentGrouping by Expressions
-- Group by year
SELECT
EXTRACT('year', order_date) AS year,
COUNT(*) AS orders,
SUM(total) AS revenue
FROM orders.csv
GROUP BY EXTRACT('year', order_date)
-- Group by first letter
SELECT
SUBSTR(name, 1, 1) AS initial,
COUNT(*) AS count
FROM users.csv
GROUP BY SUBSTR(name, 1, 1)All Non-Aggregated Columns Must Be in GROUP BY
Rule
Any column in SELECT that's not in an aggregate function must be in GROUP BY.
-- Correct
SELECT city, status, COUNT(*)
FROM users.csv
GROUP BY city, status
-- Incorrect (status not in GROUP BY)
SELECT city, status, COUNT(*)
FROM users.csv
GROUP BY cityHAVING
Filter groups after aggregation. Use HAVING with aggregate functions; use WHERE for row-level filtering.
Basic HAVING
SELECT city, COUNT(*) AS count
FROM users.csv
GROUP BY city
HAVING COUNT(*) > 100HAVING vs WHERE
SELECT city, COUNT(*) AS count
FROM users.csv
WHERE status = 'active' -- Filter rows BEFORE grouping
GROUP BY city
HAVING COUNT(*) > 10 -- Filter groups AFTER groupingMultiple HAVING Conditions
SELECT department, AVG(salary) AS avg_salary
FROM employees.csv
GROUP BY department
HAVING COUNT(*) >= 5
AND AVG(salary) > 50000HAVING with Different Aggregates
SELECT category, SUM(sales) AS total_sales
FROM products.csv
GROUP BY category
HAVING COUNT(*) > 10
AND AVG(price) > 50
ORDER BY total_sales DESCCombining Everything
A complete aggregation query:
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price,
SUM(quantity) AS total_stock,
MIN(price) AS min_price,
MAX(price) AS max_price
FROM products.csv
WHERE status = 'active'
GROUP BY category
HAVING COUNT(*) >= 5
AND AVG(price) > 10
ORDER BY total_stock DESC
LIMIT 10Common Patterns
Top N by Group
Find top items in each category:
SELECT category, product, sales
FROM (
SELECT
category,
product,
sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
FROM products.csv
)
WHERE rn <= 3Percentage of Total
SELECT
category,
SUM(sales) AS category_sales,
ROUND(SUM(sales) * 100.0 / (SELECT SUM(sales) FROM sales.csv), 2) AS pct
FROM sales.csv
GROUP BY category
ORDER BY pct DESCRunning Totals (with Window Functions)
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions.csvCount with Conditions
SELECT
department,
COUNT(*) AS total,
SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) AS active,
SUM(CASE WHEN status = 'inactive' THEN 1 ELSE 0 END) AS inactive
FROM employees.csv
GROUP BY departmentPivot-Style Aggregation
SELECT
product,
SUM(CASE WHEN month = 'Jan' THEN sales ELSE 0 END) AS jan,
SUM(CASE WHEN month = 'Feb' THEN sales ELSE 0 END) AS feb,
SUM(CASE WHEN month = 'Mar' THEN sales ELSE 0 END) AS mar
FROM monthly_sales.csv
GROUP BY productExecution Order
Understanding execution order helps write correct queries:
- FROM - Identify data source
- WHERE - Filter rows
- GROUP BY - Group rows
- HAVING - Filter groups
- SELECT - Compute output columns
- DISTINCT - Remove duplicates
- ORDER BY - Sort results
- LIMIT - Restrict output rows
This is why:
- WHERE can't use aliases defined in SELECT
- HAVING can use aggregates, WHERE can't
- ORDER BY can use aliases from SELECT
Best Practices
1. Always Use Aliases for Aggregates
-- Good
SELECT city, COUNT(*) AS count FROM users.csv GROUP BY city
-- Less clear
SELECT city, COUNT(*) FROM users.csv GROUP BY city2. Use WHERE for Row Filtering, HAVING for Groups
-- Correct: filter rows, then groups
SELECT city, COUNT(*) AS count
FROM users.csv
WHERE status = 'active'
GROUP BY city
HAVING COUNT(*) > 10
-- Incorrect: using HAVING for row filtering (works but slower)
SELECT city, COUNT(*) AS count
FROM users.csv
GROUP BY city
HAVING status = 'active' -- This won't work!3. Consider Query Order for Performance
Filter as early as possible:
-- Better: filter before grouping
SELECT city, COUNT(*) FROM users.csv
WHERE created_date > '2024-01-01'
GROUP BY city
-- Worse: grouping all data, then filtering (if possible)
-- (actually this is how you'd have to do it with aggregates)Related Documentation
- SQL Basics - SELECT, WHERE, LIMIT
- Window Functions - Advanced aggregation
- Aggregate Functions - Function reference