MySQL – optimization of performance, indexes

  Database, Web Hosting

So even for beginners in this sphere for better understanding about database index what it is and how it works, let us indicate an analogy in everyday life. Imagine that you are in a public library and need to find a book with a particular name (which you know). You have 2 options – go through the whole library, all the shelves and read every book by one name at a time. But it will obviously take a very long time.

The second possibility is to walk to the register, which in libraries are drawers in which contains tickets with the basic information of all books. Walk up to the register, where are sorted tickets by book name, look for the drawer with the initial letter, search and find a book and ticket in alphabetical order quickly. On this ticket, you will find a room and shelf which contains searched book and you can go to get searched book. This second method is obviously very effective when readers are often looking for books by title or author, it’s worth creating purpose registers for this.

And this is the same in the database.

What is a database index

Indexes are data structures that allow fast retrieval of records indexed by items. Let’s take a table in which each record is uniquely identified by some ID (such as integer) and we want to find a record with a specific number, like the following command:

SELECT * FROM books WHERE ID=1234

If there is no set index at column ID MySQL must completely pass the entire data record after record and read each ID. It is relatively challenging because beyond entry ID it may include more extensive data and MySQL has to read all of it from the data file from disk into memory and analyze. In addition, data in the data file are completely disordered (eg stored in the order in which they were entered into the database), and so there is no possibility to work like that if we get a higher value than the searched we have no longer to search the rest of database.

For these cases, it is necessary to create an index entry for ID. This means that in a separate file is stored properly structured list of all IDs (In this case, specifically the B-tree, which is in the language of mathematics and computer science means search tree which preserving the minimal depth), which appear in the table and they shall indicate by link, where is the record with this ID in data file. The index is much smaller (since it contains only the ID and a pointer to the data file), and moreover is directly adapted to fast search.

Indexes are not used only in finding of a particular value of one item, but may also be used to search for the interval.

Mixed index

Over multiple items is, of course, possible to create multiple indexes for each item. There is another possibility. If you frequently search for an entry by multiple criteria simultaneously, you can create more than one index column of the table at the same time. For example, if there is frequently searching for books by year and type, then appropriate SELECT query could look like this:

SELECT * FROM books WHERE year=2007 And type='monograf'

During the search is always used only a maximum of one index. If we had a separate index on a column for a year, and second index on a column type then only one of them will be used(the choice between them depends on the implementation, usually is done some prediction that one is likely to be more effective). If there will be used index for the year then it will find quickly all records in the index to the same year 2007 edition. These, however, will have to load all of the data from file to and go through them one by one and find out which of them are monographs, because it is not possible to recognize it from the index. Although we have spared time during retrieving the records in the table from the disk and we limited only to the books in 2007, but it’s still not perfect. If you are looking for by these two criteria very often, it is necessary to improve it.

Now comes mixed index over column year and type (in this order). In the mixed index is searched by both values ​​simultaneously, respectively as first searches in the first column and then followed by another index (B-tree), that corresponds to the specified value of the first attribute and the second seeks to attribute. That index is already complicated and less efficient but compared to reading data from the disk still much faster.

If we have a mixed index according to the example above we do not have to always search by all columns that are in it, but we can use either the “left” part of index. Mixed index release year + type can be used to search even just by release year, all types will be taken. You can not do that on the contrary, that according to this index, search by type and skip release year.

Full-text indexes

Special chapters are full-text indexes. These used to search in large text databases by keywords. When you create a full-text index on any column and analyze the words and their frequency, by this is assigned their importance. The more times a given word occurs in a text, it is likely that this text deals with the topic that the word expresses. Full-text indexes are certainly not as simple as it depends on the specific implementation and other enhancements.

So we have built index, containing words and their importance (relevance) in each record. During the search, enter one or more keywords that are found in the index, and the results are usually sorting by relevance descending.

Full-text indexes in MySQL are only available when using the MyISAM table type.

Primary and unique index

An unique index is such index, which excludes the occurrence of one value in the column multiple times. As first it can ensure that by mistake we do not put into the database one record with unique indication more times, as second the database system can better optimize the index structure because there is not possible that one value of items in the index would refer to multiple records in a data file.

The primary index is called a unique index on a column for a table whose value uniquely identifies a row (record) and has always (can be NULL).

How to correctly choose indexes

Coming up with the correct indexes on the table is often difficult. Someone might get an idea to do the indexes of all columns. In this case, however, there has been a misunderstanding of the problems. The higher number of indexes cause the most voluminous file in which they are stored, there are more reading blocks of data from disk and the more memory to store. But it is not only one problem of large indexes. Also, it is necessary to think about that when we adding records to a table (by using the INSERT INTO) or is changing (UPDATE, DELETE)it must be classified, change, or delete all indexes, which is not a trivial operation (sometimes it needs totally change the whole index).

When thinking up of indexes therefore necessary to think about the following things:

  • what is the ratio between queries and updates – if there are many queries and few updates, they could be more indexes, because they will not be replaced so often
  • according to which columns there will be most often searching, eventually which combinations will be used
  • if is possible a little reformulation of SQL queries for better indexes creation

It should be also thought about that the indexes for the integer values ​​are certainly much faster and less bulky than the indexes of text fields (here is also a complication that the text strings may have a variable length and are more complicated constructs the appropriate B-tree).

The choice of indexes is very individual and it is necessary to analyze ways to use the database as first.

Děkujeme za zpětnou vazbu!