Performance tuning in SQL queries

Is your application running very slow?
Are your database calls taking a lot of time?
Are your database calls timing out?

Well, many developers test in an environment where the number of records is low in database tables and they do not find the performance issues in their database calls.
However as soon as they push their code to the production environment or the case when their database starts growing, they start seeing the issues.

So let’s see how can we at least try our best to make sure our SQL queries can handle most cases.

1) Use “SELECT fields” instead of “SELECT *”

Always avoid select * in your SQL queries. First of all the select * is slower than select field names and then you will be loading some unnecessary columns also.

Example use something like:
SELECT FirstName, LastName, City, State from Users

2) Prefer using inner joins compared to where clause

Technically where clause and inner join will fetch the same data.
We use where clause, like “Select table1.column1, table1.column2, table1.column3, table2.column1 from TableA table1, TableB table2 where table1.column1= table2.column1”

In this query if TableA has 100 rows and TableB has 100 rows, then it will first do a cartesian join generating 100*100 rows and then apply the filter.

An efficient way will be to use inner join
Example :”Select table1.column1, table1.column2, table1.column3, table2.column1 from TableA table1 inner join TableB table2 on table1.column1= table2.column1”

3) Use wildcards at the end only

Many of us use wildcards like “Select Location from TableA where Location like ‘%New%’ ”
This will pull out unexpected data like “knew, renew”

So instead it is recommended to use a wildcard at the end only

“Select Location from TableA where Location like ‘New%’ ”

4)Index all the predicates in JOIN, WHERE, ORDER BY and GROUP BY clauses

Generally, there are more read calls than write calls, and it makes sense to index tables for reading calls. So have index for mainly main clauses like, order by, group up and on conditions.

5) Try to use UNION ALL in place of UNION

In UNION ALL, we need to have the same columns name and it just add the results from two table sets, however, in UNION , it allows us to have different column names also, which slows it down.

For Example Use

SELECT table1.column1, table1.column2,
FROM TableA
UNION ALL
SELECT table2.column1, table2.column2,
FROM TableB

Instead of:

SELECT table1.column1, table1.column2,
FROM TableA
UNION
SELECT table2.column1
FROM TableB

6) Avoid wrapping indexed columns with functions

If you apply functions on indexes columns, then they are not helpful and should be avoided. For example, I have indexed by createdate column and I am trying to apply a function on that,
it doesn’t use indexing and do a full table scan

For example :
SELECT table1.column1 from TableA WHERE YEAR(createdate) = ‘2019’
Here YEAR function overrides the index.

7) Optimize using UNION ALL for or conditions

So many times, we have situations where we use or conditions to get data which matches either of conditions

For example
SELECT table1.column1, table1.column2 FROM TableA where table1.column1 =’TEST’ or table1.column2 =’TEST2′

This query will make the query to scan full table even if we have indexes

So the alternative approach is to use UNION ALL
SELECT table1.column1, table1.column2 FROM TableA where table1.column1 =’TEST’ UNION ALL SELECT table1.column1, table1.column2 FROM TableA where table1.column2 =’TEST2′

This is the fastest approach if you don’t care about duplicates, otherwise, we can use UNION only which will remove duplicates.

8) Use Query Caching

Most databases offer caching services and can be helpful to optimize application performance.
It will cache the query with resultset and will update the cache when the data gets changed.
You can check if the database has this feature enabled by running the command
“show variables like ‘have_query_cache’”
We can also update the value of this cache settings in mysql config file, (‘/etc/mysql/my.cnf’ or ‘/etc/mysql/mysql.conf.d/mysqld.cnf’)

Recommended:
query_cache_type=1
query_cache_size = 10M
query_cache_limit=256k

9) Optimum use of indexes

Databases can use a maximum of one index at a time for a single query, so we need to have had indexes rightly designed.
The order matters in indexes

Lets see below query
SELECT table1.column1 , table1.column2 , table1.column3 , table1.column4
from TableA table1
where table1.column1 = ‘TEST’
and table1.column2 = ‘TEST1’
and table1.column3 > 10
ORDER BY table1.column3 ASC;

So here, we can add our index like,
TableA (column1, column2, column3)

10) Avoid multiple like statements and instead, use FTS

Consider we have a search operation which actually searches for match in various columns. Here we will end up writing a query like below:

Select table1.column1 , table1.column2 , table1.column3 , table1.column4
from TableA table1
where table1.column1 like ‘%SEARCH STRING%’
or table1.column2 like ‘%SEARCH STRING%’
or table1.column3 like ‘%SEARCH STRING%’

This will make our query perform very slow. Here, we can use FTS Index and FTS functions.
First, let’s add an index to the table.
ALTER TableA news ADD FULLTEXT (column1, column2, column3);

Now, we can run our query using FTS
SELECT table1.column1, table1.column2 FROM TableA
WHERE MATCH (column1, column2, column3) AGAINST (‘Search string’ )

Another feature of FTS is the score points where it also tells you about relevancy. So

SELECT table1.column1, table1.column2 FROM TableA
WHERE MATCH (column1, column2, column3) AGAINST (‘Search string’ ) as score order by score desc

11) Avoid Distinct if possible

Generally distinct is an expensive operation and can be avoided by selecting more columns. Distinct actually removes duplicate column data but can return inaccurate data presentation. It also increases the sorts.
A better way is to select more columns so that you get unique rows can accordingly work with the data.
For example :
Select distinct firstname , lastname from Users

can we rewritten as

Select distinct firstname , lastname, city, state from Users

12) Use appropriate Data Types

Use proper datatypes for data. For example, use text for large data and timestamp for time.
If possible, try to avoid nchar and nvarchar and use char and varchar.

13) Regularly verify indexes

Many times we have created indexes based on an SQL query and sometimes we are not using that functionality any more. We remove the unused code from our codebase, but we always forget to remove the index for that query. Verify if that index is not getting used and then remove it. Also, try to prefer indexes on integer columns over text columns.

14) Avoid using prefix “sp_” for stored procedures.

15) Use where clause instead of Having

Having clause is used as a filter once all rows are fetched and working on data can be an expensive call. Try to use where instead for this kind of filters.

For Example:
SELECT name ,city, state from users
GROUP BY name
HAVING city != ‘jammu’ AND AND state != ‘jammu’

can be rewritten as

SELECT name ,city, state from users
WHERE city != ‘jammu’
AND state != ‘jammu’
GROUP BY name;

Commonly Asked Questions on sql

What are the advantages/disadvantages of Indexes?
Advantages: Avoids the Full table scan and thus faster retrieval of data. It also improves the Execution plan of the database.
Disadvantages: Inserts and update statements get slow. It also takes additional disk space.

What are the reasons for slow databases?
Low memory to manage high traffic load.
Dependency on external storage like SATA disk

What is the use of explain?
The explain plan shows the complete flow and output of the query and including subqueries. It can also highlight the problems in the queries.

What are the different types of sql commands?
Data Definition Language: Used to create tables and defines the table structure.
Data Manipulation Language: Refers to insert , update, search and delete of data in the tables.
Data Control Language: DCL is used to control access to the data in the database, like grant and revoke.
Transaction Control Language: Refers to transaction management, like Commit, Rollback, Begin etc.