Usually applications run similar queries against database, but using different variable parameters in them. For example, fetch some data by its ID number in a table. What we want is to know, how frequently this query was executed at all. By removing the variable data from all queries we can then group them and count how many times they appear in the log file. Thus, from two queries SELECT NAME FROM USER WHERE ID=223 and SELECT NAME FROM USER WHERE ID=223 we will get one pattern SELECT NAME FROM USER WHERE ID={} two times.
As there may be variable data of different types we also wish to remove quoted strings (considering all possible escaping techniques), multiline and single line comments, unsignificant whitespaces, siquential variables like ID IN (1 ,2 ,3), etc.
That's exactly the reason, why this utility has been developed - to know, which queries are the most frequently run on database server. The more popular the query is, the more performance may be gained by optimizing it.
For slow query formats there are available additional options to use for sorting by a different then number of queries criteria. As slow query log provides the information for each query about the time it was running, the time it was locked, the number of rows totally examined and finally returned. MyProfi allowes to sort the statistics by any of these field, and even by maximum, total or average numbers for each pattern.
Query type statistics is sometimes useful for appropriate database server configuration and tuning. For example, if UPDATE queries are the most frequently run against database, a developer may consider restructurizing tables to reduce the use of indexes, as too many indexes in tables make its update slower. Or otherwise, if SELECT queries are popular, it may be a reason to allocate more memory buffer space, etc.
MyProfi understands several input file formats. Beside general query log format, slow query log format is also supported.
As of version 5.1.x of MySQL server, both the general and slow query logs may be configured to be writen to mysql system table. By setting an appropriate option in MyProfi or providing input file with .csv extension we can tell the parser to treat input files as csv formatted.
Moreover, for slow query log you can output an additional statistics.
No matter how huge your input file is, the parser will never load whole content into memory, but reads the file chunk by chunk (though, you might be going to wait for "some time" while this huge file is being processed).
You can choose to output statistics only for certain type of queries: SELECT, UPDATE, DELETE, etc. This may be useful, for instance, when you are going to check the most popular queries execution plan using EXPLAIN. In this case non-SELECT queries will be just useless.
Output only first N patters in sorted statistics.