Thursday, 1 August 2013

MySQL performance tuning

My company is using MySQL on large scale. By large scale I consider databases with more than 1,5 million tables where each table has several million records. Achieving desired performance certainly requires heavy duty hardware components but, then again, if database engine is not tuned to use hardware properly, adding more powerful hardware won't have any effect.

So, by digging through documentation, changing few variables and performing some experiments, on some databases I was able to improve database performance (speed / response time) by 12% - 45% and reduce HDD I/O by 50% without touching hardware at all.

So, what I did and what I played with?

Well, firstly, as you may know, MySQL has many database storage engine implementations to choose from. Most famous ones are surely MyISAM, InnoDB, Memory and BerkeleyDB. However, on the market there are many many more. So, first step was to use proper db engine for each database table. Since database I was tuning had to be really fast and responsive as well as able to do as much parallel processing as possible, I narrowed down my choice to two storage engines - InnoDB and Memory. I used InnoDB storage engine for most tables and Memory engine for few most frequently used tables (determined by application behavior and database stats) that could fit in RAM memory.

ALTER TABLE table_name ENGINE = 'InnoDB';

Only keep in mind that data in Memory tables will be lost after each server restart and that maximum size of table is limited by max_heap_table_size variable value.

Secondly, after observing cached query hit ratio, I decided to turn query caching off.

query_cache_size = 0

Why? Many people believe that caching is the best performance booster there is. However, this is not the the true. In many cases, query cache can even degrade database performance. You may ask how is that possible. In order to understand why is that so, we need to understand how MySQL query cache works. Unlike some database management systems like MSSQL or Oracle, MySQL DOES NOT cache execution plans but FULL RESULT SETS. This means that query cache hits will occur only and only if:
a) you are executing exactly the same query with exactly same conditions many times
b) data has not changed in meantime.

So, every time when any data in table that is used by cached query changes, this will cause query to be re-cached, even if data change would not effect result. This takes time. Actually, this takes more time than normally executing query since apart from query execution additional operations must be performed. Therefore, in practice, unless your data is not changing so fast (for example blog), and unless cached query hit ratio is over 1:10, it is not recommended to use query cache since it does not bring any performance benefits.

Finally, I tuned up couple of InnoDB engine variables. Practice has showed that you actually need to customize only few engine variables to achieve optimal performance. Many people tend to play with to many variables and change many values. Well, practice showed that most variables are really best left with default values since they have already been optimized for best MySQL performance, except few that I will mention now.

First variable is surely innodb_buffer_pool_size.This is one of the variables that can really effect database performance. I will skip theory behind it and just tell you that recommended size is between 50 and 80% of available RAM memory.

Second one is innodb_log_file_size.Again, I will skip theory behind it and just tell you that recommended size is 25% of innodb_buffer_pool_size.

Third one is innodb_log_buffer_size.Here, recommended size is between 6 and 16 MB.

There are few more variables that you can also tune up but that require certain hardware predispositions. For example, if your server has fast disks (SSD disks or  HDD array) with large I/O capacity, make sure to correct value of innodb_io_capacity variable. Default value is 200 which was speed limit for older HDDs. However, if you use SSD disks or arrays that are capable of more IOPS than you should increase size to be let's say up to 80% of  HDD IOPS speed. For example we're using one system with capacity of 56000 IOPS but, since it is used by multiple servers, I've increased value of this variable to "only" 1500. Also, if you are using storage system that is performing data caching and which has controller with battery you can set  innodb_flush_method variable to O_DIRECT. Doing so you will avoid double caching (by both MySQL engine and HDD array) which will speed up database even more. Finally, if your database contains large number of tables and application using database is accessing large number of them, consider increasing value of innodb_open_files variable.

No comments: