Saturday, September 27, 2008

clover DB Tool

clover.ETL and clover.GUI are ETL tools meant for developing data transformations and data integration applications. They are based on Java technology and therefore platform independent and resource-efficient. clover.ETL is an open source project, released under both LGPL and commercial license. clover.GUI is free for non-commercial purposes.

http://www.cloveretl.org

clover.ETL

clover.ETL is an Open Source, Java based data integration framework which can be used to transform data.

Clover.ETL is released under dual-license:

commercial
LGPL License.

Feature Highlights

internally represents all character data as Unicode - any character from any codepage can be represented - ASCII, LATIN, ASIAN, etc.
converts data from & to common character sets (ASCII, UTF-8, ISO-8859-1,ISO-8859-2, etc)
contains palette of more than 40 specialized transformation components
nativelly supports all major industry standard databases (Oracle, MS SQL, DB2, Infromix, Sybase) and several open source variants (MySQL, PostgreSQL). Other databases are supported through JDBC layer
reads & writes XML data/ Excel (XLS) data/ variable length data (CSV) / fix-length data and several other not so common formats (dBase/FoxPro/FlashFiler)
supports remote reading/writing of data through FTP/SFTP/HTTP/HTTPS protocols, also works with ZIPed & GZIPed data
runs on 32bit & 64bit platforms - Windows, Linux, AIX, Solaris, HP-UX, AS/400 and many others

More Features

data records are internally handled as a variable-length data structures - can handle very large records (theoretical limit 2³¹)
fields can have default values, text formats, NULL value checkers
contains connectors to other systems through standard protocols: middleware - JMS, LDAP, SOAP
transformation of data is performed by independent components, each running as an independent thread - can utilize more than one CPU (is very scalable)
framework implements so called pipeline-parallelism - when data record is processed by component, it is immediately sent to the following component for additional processing.
contains interpreter of specialized transformation language - CTL
transformations (mappings) of data can be also coded directly in Java or any other scripting language
metadata describing structures of data (records) and transformations can be read/written from/to XML
works with common Source Code Management Systems - CVS, SVN, SourceSafe
can be easily extended by custom components

clover.ETL Overview

More detailed information on clover.ETL's functionality can be found in the clover.ETL Overview slides (in PDF format).

MySQL Query Profiler

Using the New MySQL Query Profiler

One of the great things about MySQL is the superior innovation model that’s used to deliver database server software. Rather than relying solely on internal engineers who create and maintain a piece of software (as in a traditional software company), MySQL partners with the millions of active users across the world who take advantage of the open source model and daily extend the MySQL server to do new and pioneering things. These innovations can then be submitted to MySQL AB, tested, validated, and rolled into the database server so everyone can benefit from the creativity of the very active MySQL community.

An example of this model in action is the recent release of a new SQL diagnostic facility that assists in the debugging and analysis of running SQL - the SQL Profiler. The new profiler became available in the 5.0.37 version of the MySQL Community Server and was created and submitted by Jeremy Cole of Proven Scaling (http://www.provenscaling.com/).

Let's take a look at this new diagnostic utility Jeremy developed and see how it can help you create high-performing SQL code as well as assist in troubleshooting existing queries that aren't providing the response times you'd like.

The Best Way to Diagnose Performance Problems

When it comes to overall performance, it's important to remember that the #1 contributor is always a good database design. The second highest contributor to performance is well-tuned SQL code. Some try and switch the two in priority, but this is a mistake because a bad design has the potential to simply drown even the best-tuned SQL code (e.g. you can't get index access in your SQL queries if the physical index design is poorly done). But make no mistake - bad SQL code that's introduced into an otherwise well-running database can make a real train wreck of things.

So how do you go about analyzing database performance? There are three forms of performance analysis that are used to troubleshoot and tune database systems:

Bottleneck analysis - focuses on answering the questions: What is my database server waiting on; what is a user connection waiting on; what is a piece of SQL code waiting on?
Workload analysis - examines the server and who is logged on to determine the resource usage and activity of each.
Ratio-based analysis - utilizes a number of rule-of-thumb ratios to gauge performance of a database, user connection, or piece of code.

Of the three, bottleneck analysis is the most efficient approach in terms of fast problem resolution. By determining where the server, a user connection, or set of SQL code is spending its time, you can then work to eliminate the discovered blockages, increase throughput, and reduce overall response times. Unfortunately, this hasn't always been easy in MySQL because the server hasn't traditionally exposed the type of wait-based and resource statistics to quickly uncover bottlenecks.

But a better day has come along in MySQL 5.0.37 with the inclusion of the SQL Profiling utility. This interactive diagnostic aid allows a developer or DBA to gain insight into where a set of SQL code spends its time and the resources it is using. The best way to see the power of this new utility is to walk through some examples of how it's used and see the clear benefits it supplies, so let's do just that.

Getting started with the SQL Profiler

The SQL Profiler is built into the database server and can be dynamically enabled/disabled via the MySQL client utility. To begin profiling one or more SQL queries, simply issue the following command:

mysql> set profiling=1;

Query OK, 0 rows affected (0.00 sec)

Two things happen once you issue this command. First, any query you issue from this point on will be traced by the server with various performance diagnostics being created and attached to each distinct query. Second, a memory table named profiling is created in the INFORMATION_SCHEMA database for your particular session (not viewable by any other MySQL session) that stores all the SQL diagnostic results. This table remains persistent until you disconnect from MySQL at which point it is destroyed.

Now, simply execute a SQL query:

mysql> select count(*) from client where broker_id=2;

+----------+

| count(*) |

+----------+

|      200 |

+----------+

1 row in set (0.00 sec)

Once the query completes, you can issue the following command to view the SQL profiles that have currently been stored for you:

mysql> show profiles;

+----------+------------+-----------------------------------------------+

| Query_ID | Duration   | Query                                         |

+----------+------------+-----------------------------------------------+

|        0 | 0.00007300 | set profiling=1                               |

|        1 | 0.00044700 | select count(*) from client where broker_id=2 |

+----------+------------+-----------------------------------------------+

2 rows in set (0.00 sec)

You get a quick summary of all your captured SQL plus the total duration that the query took to complete. To get the same diagnostic info, you can also query the memory table that holds your statistical information:

mysql> select sum(duration) from information_schema.profiling where query_id=1;

+---------------+

| sum(duration) |

+---------------+

|      0.000447 |

+---------------+

1 row in set (0.00 sec)

I'll show you why it's good to be able to query the memory-based profiling table later in this article, but for now, let's concentrate on other SHOW commands that you can use to get more detailed diagnostic info about one or more queries that you've profiled. The most basic command is one that lists the steps a profiled query went through to satisfy your SQL request, along with each step's time:

mysql> show profile for query 1;

+--------------------+------------+

| Status             | Duration   |

+--------------------+------------+

| (initialization)   | 0.00006300 |

| Opening tables     | 0.00001400 |

| System lock        | 0.00000600 |

| Table lock         | 0.00001000 |

| init               | 0.00002200 |

| optimizing         | 0.00001100 |

| statistics         | 0.00009300 |

| preparing          | 0.00001700 |

| executing          | 0.00000700 |

| Sending data       | 0.00016800 |

| end                | 0.00000700 |

| query end          | 0.00000500 |

| freeing items      | 0.00001200 |

| closing tables     | 0.00000800 |

| logging slow query | 0.00000400 |

+--------------------+------------+

15 rows in set (0.00 sec)

You can also just issue SHOW PROFILE and exclude the identification of a specific profile number to see the very last profile you created.

Regardless of how you obtain the output, you can see this data is extremely valuable in that it allows you to see where your query spends its time during execution. This is done for all types of queries, and not just SELECT statements as this example shows:

mysql> alter table t engine=myisam;

Query OK, 112050 rows affected (0.64 sec)

Records: 112050  Duplicates: 0  Warnings: 0

mysql> show profiles;

+----------+------------+-----------------------------------------------+

| Query_ID | Duration   | Query                                         |

+----------+------------+-----------------------------------------------+

|        0 | 0.00007300 | set profiling=1                               |

|        1 | 0.00044700 | select count(*) from client where broker_id=2 |

|        2 | 0.00003400 | set profiling=0                               |

|        3 | 0.00007400 | set profiling=1                               |

|        4 | 0.63789700 | alter table t engine=myisam                   |

|        5 | 0.00004000 | set profiling=0                               |

+----------+------------+-----------------------------------------------+

6 rows in set (0.00 sec)

mysql> show profile for query 4;

+----------------------+------------+

| Status               | Duration   |

+----------------------+------------+

| (initialization)     | 0.00002900 |

| checking permissions | 0.00000800 |

| init                 | 0.00004000 |

| Opening table        | 0.00009400 |

| System lock          | 0.00000500 |

| Table lock           | 0.00000700 |

| setup                | 0.00004200 |

| creating table       | 0.00195800 |

| After create         | 0.00010900 |

| copy to tmp table    | 0.52264500 |

| rename result table  | 0.11289400 |

| end                  | 0.00004600 |

| query end            | 0.00000700 |

| freeing items        | 0.00001300 |

+----------------------+------------+

14 rows in set (0.00 sec)

So as you can see in the above profile, the ALTER TABLE statement spends the bulk of its time in the temporary table copy step. Armed with this type of information, you have more insight into the hoops your query is jumping through from start to finish, and therefore, you can then work to tune your queries to help eliminate any identified bottlenecks.

There's more information than just duration that you can get from your profiles - for example, CPU usage (which, unfortunately, is not available on all platforms; the below comes from Linux):

mysql> show profile cpu for query 4;

+----------------------+------------+------------+------------+

| Status               | Duration   | CPU_user   | CPU_system |

+----------------------+------------+------------+------------+

| (initialization)     | 0.00002900 | 0.00000000 | 0.00000000 |

| checking permissions | 0.00000800 | 0.00000000 | 0.00000000 |

| init                 | 0.00004000 | 0.00000000 | 0.00000000 |

| Opening table        | 0.00009400 | 0.00100000 | 0.00000000 |

| System lock          | 0.00000500 | 0.00000000 | 0.00000000 |

| Table lock           | 0.00000700 | 0.00000000 | 0.00000000 |

| setup                | 0.00004200 | 0.00000000 | 0.00000000 |

| creating table       | 0.00195800 | 0.00000000 | 0.00100000 |

| After create         | 0.00010900 | 0.00000000 | 0.00000000 |

| copy to tmp table    | 0.52264500 | 0.55591600 | 0.04199300 |

| rename result table  | 0.11289400 | 0.00199900 | 0.00000000 |

| end                  | 0.00004600 | 0.00000000 | 0.00000000 |

| query end            | 0.00000700 | 0.00000000 | 0.00000000 |

| freeing items        | 0.00001300 | 0.00000000 | 0.00000000 |

+----------------------+------------+------------+------------+

14 rows in set (0.00 sec)

Other parameters that can be passed to the SHOW PROFILE command include:

ALL - displays all information
BLOCK IO - displays counts for block input and output operations
CONTEXT SWITCHES - displays counts for voluntary and involuntary context switches
IPC - displays counts for messages sent and received
MEMORY - is not currently implemented
PAGE FAULTS - displays counts for major and minor page faults
SOURCE - displays the names of functions from the source code, together with the name and line number of the file in which the function occurs
SWAPS - displays swap counts

Using the Profiler for Tuning Queries

Let's take a look at a quick example of how the new profiler can be a help to tune an inefficient query. I don't know about you, but I've always been just so-so at being able to really use EXPLAIN's to troubleshoot queries. But the new profiler makes things pretty nice and easy.

I have a MySQL database that's used to track investment activity for a small brokerage house. There's a view in the database that's used to get a quick report of client accounts over a million dollars:

mysql> select * from v_client_portfolio_high;

+-----------+-------------------+------------------+-----------------+

| client_id | client_first_name | client_last_name | portfolio_value |

+-----------+-------------------+------------------+-----------------+

|         5 | ABNER             | ROSSELLETT       |      1252115.50 |

|       500 | CANDICE           | BARTLETT         |      1384877.50 |

+-----------+-------------------+------------------+-----------------+

2 rows in set (3.73 sec)

Now a query running under four seconds isn't necessarily a bad thing, but I wonder if things could be made a little bit better. So let's profile the view and see what we can discover:

mysql> set profiling=1;

Query OK, 0 rows affected (0.00 sec)

mysql> select * from v_client_portfolio_high;

+-----------+-------------------+------------------+-----------------+

| client_id | client_first_name | client_last_name | portfolio_value |

+-----------+-------------------+------------------+-----------------+

|         5 | ABNER             | ROSSELLETT       |      1252115.50 |

|       500 | CANDICE           | BARTLETT         |      1384877.50 |

+-----------+-------------------+------------------+-----------------+

2 rows in set (4.01 sec)

mysql> set profiling=0;

Query OK, 0 rows affected (0.00 sec)

mysql> show profiles;

+----------+------------+-----------------------------------------------+

| Query_ID | Duration   | Query                                         |

+----------+------------+-----------------------------------------------+

|        0 | 0.00007300 | set profiling=1                               |

|        1 | 0.00044700 | select count(*) from client where broker_id=2 |

|        2 | 0.00003400 | set profiling=0                               |

|        3 | 0.00007400 | set profiling=1                               |

|        4 | 0.63789700 | alter table t engine=myisam                   |

|        5 | 0.00004000 | set profiling=0                               |

|        6 | 0.00007600 | set profiling=1                               |

|        7 | 4.01965600 | select * from v_client_portfolio_high         |

|        8 | 0.00003500 | set profiling=0                               |

+----------+------------+-----------------------------------------------+

Now at first, I'm tempted to issue a SHOW PROFILE for query 7, which is my view query, but instead let's see just how many lines of diagnostic data the profiler has collected for me:

mysql> select count(*) from information_schema.profiling where query_id=7;

+----------+

| count(*) |

+----------+

|    74734 |

+----------+

Hello! The profiler generated over 74,000 lines of output for me, which might be hard to wade through using the normal SHOW PROFILE command (even though a LIMIT option is provided). Instead, let's use a SQL query against the profiling table to see what our query did:

mysql> select min(seq) seq,state,count(*) numb_ops,

    -> round(sum(duration),5) sum_dur, round(avg(duration),5) avg_dur,

    -> round(sum(cpu_user),5) sum_cpu, round(avg(cpu_user),5) avg_cpu

    -> from information_schema.profiling

    -> where query_id = 7

    -> group by state

    -> order by seq;

+-------+----------------------+----------+---------+---------+---------+---------+

| seq   | state                | numb_ops | sum_dur | avg_dur | sum_cpu | avg_cpu |

+-------+----------------------+----------+---------+---------+---------+---------+

|     0 | (initialization)     |        1 | 0.00004 | 0.00004 | 0.00000 | 0.00000 |

|     1 | Opening tables       |        1 | 0.00023 | 0.00023 | 0.00000 | 0.00000 |

|     2 | System lock          |        1 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

|     3 | Table lock           |        1 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

|     4 | checking permissions |        1 | 0.00010 | 0.00010 | 0.00000 | 0.00000 |

|     5 | optimizing           |        4 | 0.00004 | 0.00001 | 0.00000 | 0.00000 |

|     6 | statistics           |        4 | 0.00007 | 0.00002 | 0.00100 | 0.00025 |

|     7 | preparing            |        4 | 0.00005 | 0.00001 | 0.00000 | 0.00000 |

|     8 | Creating tmp table   |        1 | 0.00003 | 0.00003 | 0.00000 | 0.00000 |

|     9 | executing            |    37352 | 0.16631 | 0.00000 | 0.05899 | 0.00000 |

|    10 | Copying to tmp table |        1 | 0.00006 | 0.00006 | 0.00000 | 0.00000 |

|    15 | Sending data         |    37353 | 3.85151 | 0.00010 | 3.72943 | 0.00010 |

| 74717 | Sorting result       |        1 | 0.00112 | 0.00112 | 0.00100 | 0.00100 |

| 74719 | removing tmp table   |        2 | 0.00003 | 0.00001 | 0.00000 | 0.00000 |

| 74721 | init                 |        1 | 0.00002 | 0.00002 | 0.00000 | 0.00000 |

| 74727 | end                  |        1 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

| 74728 | query end            |        1 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |

| 74729 | freeing items        |        1 | 0.00002 | 0.00002 | 0.00000 | 0.00000 |

| 74730 | closing tables       |        2 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

| 74733 | logging slow query   |        1 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |

+-------+----------------------+----------+---------+---------+---------+---------+

Looking at the diagnostics above, I can see that my view query spends the bulk of its time in the Sending data step, with that step and the executing step experiencing over 37,000 operations. Why would this be the case? Let's take a look at the underlying view definition and see what the actual query looks like:

create view v_client_portfolio_high

    client_id,

    client_first_name,

    client_last_name,

    portfolio_value

as

select  b.client_id,

        b.client_first_name,

        b.client_last_name,

       (select sum(number_of_units * price)

from client_transaction c

where c.action = 'buy' and

c.client_id = b.client_id) -

(select sum(number_of_units * price)

from client_transaction d

where d.action = 'sell' and

d.client_id = b.client_id) portfolio_value

from    client_transaction a,

        client b

where   a.client_id = b.client_id

group by b.client_id,

        b.client_first_name,

        b.client_last_name

having portfolio_value > 1000000

The view does some normal column grabs from the underlying base tables, but then uses some inline views to calculate a client's overall portfolio value. The inline views/subselects are obviously what's causing the large number of data sends and executions. Let's eliminate just one of the subselects by pulling it up into the main query and if that makes a difference. We'll drop and recreate the view (with a little clean up on NULL values that may affect the computed outcome), profile the new view's performance, and then examine the diagnostic output:

mysql> drop view v_client_portfolio_high;

Query OK, 0 rows affected (0.00 sec)

mysql> create view v_client_portfolio_high

    -> (

    ->     client_id,

    ->     client_first_name,

    ->     client_last_name,

    ->     portfolio_value

    -> )

    -> as

    -> select  b.client_id,

    ->  b.client_first_name,

    ->  b.client_last_name,

    ->  sum(number_of_units * price)  -

    ->  case

    ->          (select sum(number_of_units * price)

    ->          from client_transaction d

    ->          where d.action = 'sell' and

    ->          d.client_id = b.client_id)

    ->  when NULL then 0

    ->  else

    ->          (select sum(number_of_units * price)

    ->          from client_transaction d

    ->          where d.action = 'sell' and

    ->          d.client_id = b.client_id)

    ->  end portfolio_value

    -> from    client_transaction a,

    ->  client b

    -> where   a.client_id = b.client_id and

    ->  action = 'buy'

    -> group    by b.client_id,

    ->  b.client_first_name,

    ->  b.client_last_name

    -> having   portfolio_value > 1000000;

Query OK, 0 rows affected (0.00 sec)

mysql> set profiling=1;

Query OK, 0 rows affected (0.00 sec)

mysql> select * from v_client_portfolio_high;

+-----------+-------------------+------------------+-----------------+

| client_id | client_first_name | client_last_name | portfolio_value |

+-----------+-------------------+------------------+-----------------+

|         5 | ABNER             | ROSSELLETT       |      1252115.50 |

|       500 | CANDICE           | BARTLETT         |      1384877.50 |

+-----------+-------------------+------------------+-----------------+

2 rows in set (0.47 sec)

mysql> set profiling=0;

Query OK, 0 rows affected (0.00 sec)

mysql> show profiles;

+----------+------------+-----------------------------------------------+

| Query_ID | Duration   | Query                                         |

+----------+------------+-----------------------------------------------+

|        0 | 0.00007300 | set profiling=1                               |

|        1 | 0.00044700 | select count(*) from client where broker_id=2 |

|        2 | 0.00003400 | set profiling=0                               |

|        3 | 0.00007400 | set profiling=1                               |

|        4 | 0.63789700 | alter table t engine=myisam                   |

|        5 | 0.00004000 | set profiling=0                               |

|        6 | 0.00007600 | set profiling=1                               |

|        7 | 4.01965600 | select * from v_client_portfolio_high         |

|        8 | 0.00003500 | set profiling=0                               |

|        9 | 0.00007500 | set profiling=1                               |

|       10 | 0.45292700 | select * from v_client_portfolio_high         |

|       11 | 0.00003800 | set profiling=0                               |

+----------+------------+-----------------------------------------------+

12 rows in set (0.00 sec)

mysql> select min(seq) seq,state,count(*) numb_ops,

    -> round(sum(duration),5) sum_dur, round(avg(duration),5) avg_dur,

    -> round(sum(cpu_user),5) sum_cpu, round(avg(cpu_user),5) avg_cpu

    -> from information_schema.profiling

    -> where query_id = 10

    -> group by state

    -> order by seq;

+-------+----------------------+----------+---------+---------+---------+---------+

| seq   | state                | numb_ops | sum_dur | avg_dur | sum_cpu | avg_cpu |

+-------+----------------------+----------+---------+---------+---------+---------+

|     0 | (initialization)     |        1 | 0.00004 | 0.00004 | 0.00000 | 0.00000 |

|     1 | Opening tables       |        1 | 0.00024 | 0.00024 | 0.00000 | 0.00000 |

|     2 | System lock          |        1 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

|     3 | Table lock           |        1 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

|     4 | checking permissions |        1 | 0.00012 | 0.00012 | 0.00100 | 0.00100 |

|     5 | optimizing           |        4 | 0.00005 | 0.00001 | 0.00000 | 0.00000 |

|     6 | statistics           |        4 | 0.00007 | 0.00002 | 0.00000 | 0.00000 |

|     7 | preparing            |        4 | 0.00005 | 0.00001 | 0.00000 | 0.00000 |

|     8 | Creating tmp table   |        1 | 0.00003 | 0.00003 | 0.00000 | 0.00000 |

|     9 | executing            |    11194 | 0.04983 | 0.00000 | 0.01800 | 0.00000 |

|    10 | Copying to tmp table |        1 | 0.00008 | 0.00008 | 0.00000 | 0.00000 |

|    15 | Sending data         |    11195 | 0.39853 | 0.00004 | 0.36794 | 0.00003 |

| 22401 | Sorting result       |        1 | 0.00375 | 0.00375 | 0.00400 | 0.00400 |

| 22403 | removing tmp table   |        2 | 0.00005 | 0.00002 | 0.00000 | 0.00000 |

| 22405 | init                 |        1 | 0.00002 | 0.00002 | 0.00000 | 0.00000 |

| 22411 | end                  |        1 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

| 22412 | query end            |        1 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |

| 22413 | freeing items        |        1 | 0.00002 | 0.00002 | 0.00000 | 0.00000 |

| 22414 | closing tables       |        2 | 0.00001 | 0.00001 | 0.00000 | 0.00000 |

| 22417 | logging slow query   |        1 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |

+-------+----------------------+----------+---------+---------+---------+---------+

20 rows in set (0.44 sec)

Much better! Although more tuning on the view can likely be done, this simple change has knocked the Sending data and the executing step counts down by over two-thirds, with the end result being an 88% reduction in overall response time. The profiler was very useful in this case as this particular query shows that even though an individual step in the SQL execution process may be handled very quickly (the average duration column in the above query), having that same step executed over and over again can put a major dent in total query performance.

Some Last Thoughts on the Profiler

Here are some other things to keep in mind when using the new SQL profiler:

Profiling is initially turned off for each session.

By default, you can store 15 different query profiles for your session. You can increase this up to 100 by setting the profiling_history_size session variable. Certain diagnostics rely on operating system support for the getrusage() system call, so you may see NULL values for some statistics if you're on a platform that doesn't support this function.

There were a few bugs in the initial release of the profiler. The first centered around a problem that developed if you issued a SHOW PROFILES command before enabling profiling. Another concerned duration statistics that were not accurately paired up for each SQL execution step. All these have been fixed now and will be in the next Community server source drop and binary.

For more information on the SQL profiler, you can review the documentation found in the MySQL 5.0 manual at: http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html.

The new SQL profiler is a nice addition to the DBA's and developer's toolbelt and is a great example of the vibrant MySQL Community (in this case, Jeremy Cole) in action. I can't wait to see what they come up with next!

Profiling Perl

· Listen Print Subscribe to Perl.com

Profiling Perl

by Simon Cozens
June 24, 2004

Everyone wants their Perl code to run faster. Unfortunately, without understanding why the code is taking so long to start with, it’s impossible to know where to start optimizing it. This is where “profiling” comes in; it lets us know what our programs are doing.

We’ll look at why and how to profile programs, and then what to do with the profiling information once we’ve got it.

Why Profile?

There’s nothing worse than setting off a long-running Perl program and then not knowing what it’s doing. I’ve recently been working on a new, mail-archiving program for the perl.org mailing lists, and so I’ve had to import a load of old email into the database. Here’s the code I used to do this:

    use File::Slurp;

    use Email::Store "dbi:SQLite:dbname=mailstore.db";

    use File::Find::Rule;

    for (File::Find::Rule->file->name(qr/\d+/)->in("perl6-language")) {

        Email::Store::Mail->store(scalar read_file($_));

It’s an innocent little program — it looks for all the files in the perl6-language directory whose names are purely numeric (this is how messages are stored in an ezmlm archive), reads the contents of the files into memory with File::Slurp::read_file, and then uses Email::Store to put them into a database. You start it running, and come back a few hours later and it’s done.

All through, though, you have this nervous suspicion that it’s not doing the right thing; or at least, not doing it very quickly. Sure there’s a lot of mail, but should it really be taking this long? What’s it actually spending its time doing? We can add some print statements to help us feel more at ease:

    use File::Slurp;

    use Email::Store "dbi:SQLite:dbname=mailstore.db";

    use File::Find::Rule;

    print "Starting run...\n";

    $|++;

    for (File::Find::Rule->file->name(qr/\d+/)->in("perl6-language")) {

        print "Indexing $_...";

        Email::Store::Mail->store(scalar read_file($_));

        print " done\n";

Now we can at least see more progress, but we still don’t know if this program is working to full efficiency, and the reason for this is that there’s an awful lot going on in the underlying modules that we can’t immediately see. Is it the File::Find::Rule that’s taking up all the time? Is it the storing process? Which part of the storing process? By profiling the code we’ll identify, and hopefully smooth over, some of the bottlenecks.

Simple Profiling

The granddaddy of Perl profiling tools is Devel::DProf. To profile a code run, add the -d:DProf argument to your Perl command line and let it go:

    % perl -d:DProf store_archive

The run will now take slightly longer than normal as Perl collects and writes out information on your program’s subroutine calls and exits, and at the end of your job, you’ll find a file called tmon.out in the current directory; this contains all the profiling information.

A couple of notes about this:

It’s important to control the length of the run; in this case, I’d probably ensure that the mail archive contained about ten or fifteen mails to store. (I used seven in this example.) If your run goes on too long, you will end up processing a vast amount of profiling data, and not only will it take a lot time to read back in, it’ll take far too long for you to wade through all the statistics. On the other hand, if the run’s too short, the main body of the processing will be obscured by startup and other “fixed costs.”
The other problem you might face is that Devel::DProf, being somewhat venerable, occasionally has problems keeping up on certain recent Perls, (particularly the 5.6.x series) and may end up segfaulting all over the place. If this affects you, download the Devel::Profiler module from CPAN, which is a pure-Perl replacement for it.

The next step is to run the preprocessor for the profiler output, dprofpp. This will produce a table of where our time has been spent:

  Total Elapsed Time = 13.89525 Seconds

    User+System Time = 9.765255 Seconds

  Exclusive Times

  %Time ExclSec CumulS #Calls sec/call Csec/c  Name

   24.1   2.355  4.822     38   0.0620 0.1269  File::Find::_find_dir

   20.5   2.011  2.467  17852   0.0001 0.0001  File::Find::Rule::__ANON__

   7.82   0.764  0.764    531   0.0014 0.0014  DBI::st::execute

   4.73   0.462  0.462  18166   0.0000 0.0000  File::Spec::Unix::splitdir

   2.92   0.285  0.769    109   0.0026 0.0071  base::import

   2.26   0.221  0.402    531   0.0004 0.0008  Class::DBI::transform_sql

   2.09   0.204  0.203   8742   0.0000 0.0000  Class::Data::Inheritable::__ANON__

   1.72   0.168  0.359  18017   0.0000 0.0000  Class::DBI::Column::name_lc

   1.57   0.153  0.153  18101   0.0000 0.0000  Class::Accessor::get

   1.42   0.139  0.139     76   0.0018 0.0018  Cwd::abs_path

The first two lines tell us how long the program ran for: around 14 seconds, but it was actually only running for about 10 of those — the rest of the time other programs on the system were in the foreground.

Next we have a table of subroutines, in descending order of time spent; perhaps surprisingly, we find that File::Find and File::Find::Rule are the culprits for eating up 20% of running time each. We’re also told the number of “exclusive seconds,” which is the amount of time spent in one particular subroutine, and “cumulative seconds.” This might better be called “inclusive seconds,” since it’s the amount of time the program spent in a particular subroutine and all the other routines called from it.

From the statistics above, we can guess that File::Find::_find_dir itself took up 2 seconds of time, but during its execution, it called an anonymous subroutine created by File::Find::Rule, and this subroutine also took up 2 seconds, making a cumulative time of 4 seconds. We also notice that we’re making an awful lot of calls to File::Find::Rule, splitdir, and some Class::DBI and Class::Accessor routines.

What to Do Now

Now we have some profiling information, and we see a problem with File::Find::Rule. “Aha,” we might think, “Let’s replace our use of File::Find::Rule with a simple globbing operation, and we can shave 4 seconds off our runtime!”. So, just for an experiment, we try it:

    use File::Slurp;

    use Email::Store "dbi:SQLite:dbname=mailstore.db";

    $|=1;

    for (<perl6-language/archive/0/*>) {

        next unless /\d+/;

        print "$_ ...";

        Email::Store::Mail->store(scalar read_file($_));

        print "\n";

Now this looks a bit better:

 Total Elapsed Time = 9.559537 Seconds

   User+System Time = 5.329537 Seconds

 Exclusive Times

 %Time ExclSec CumulS #Calls sec/call Csec/c  Name

  13.1   0.703  0.703    531   0.0013 0.0013  DBI::st::execute

  5.54   0.295  0.726    109   0.0027 0.0067  base::import

  5.52   0.294  0.294  18101   0.0000 0.0000  Class::Accessor::get

  3.45   0.184  1.930  19443   0.0000 0.0001  Class::Accessor::__ANON__

  3.13   0.167  0.970    531   0.0003 0.0018  DBIx::ContextualFetch::st::_untain

                                              t_execute

  3.10   0.165  1.324   1364   0.0001 0.0010  Class::DBI::get

  2.98   0.159  0.376    531   0.0003 0.0007  Class::DBI::transform_sql

  2.61   0.139  0.139     74   0.0019 0.0019  Cwd::abs_path

  2.23   0.119  0.119   8742   0.0000 0.0000  Class::Data::Inheritable::__ANON__

  2.06   0.110  0.744   2841   0.0000 0.0003  Class::DBI::__ANON__

  1.95   0.104  0.159   2669   0.0000 0.0001  Class::DBI::ColumnGrouper::group_cols

Now to be honest, I would never have guessed that removing File::Find::Rule would shave 4 seconds off my code run. This is the first rule of profiling: You actually need to profile before optimizing, because you never know where the hotspots are going to turn out to be. We’ve also exercised the second rule of profiling: Review what you’re using. By using another technique instead of File::Find::Rule, we’ve reduced our running time by a significant amount.

This time, it looks as though we’re doing reasonably well — the busiest thing is writing to a database, and that’s basically what this application does, so that’s fair enough. There’s also a lot of busy calls that are to do with Class::DBI, and we know that we use Class::DBI as a deliberate tradeoff between convenience and efficiency. If we were being ruthlessly determined to make this program faster, we’d start looking at using plain DBI instead of Class::DBI, but that’s a tradeoff I don’t think is worth making at the moment.

This is the third rule of profiling: Hotspots happen. If you got rid of all the hotspots in your code, it wouldn’t do anything. There are a certain reasonable number of things that your program should be doing for it to be useful, and you simply can’t get rid of them; additionally there are any number of tradeoffs that we deliberately or subconsciously make in order to make our lives easier at some potential speed cost — for instance, writing in Perl or C instead of machine code.

From Exclusive to Inclusive

The default report produced by dprofpp is sorted by exclusive subroutine time, and is therefore good at telling us about individual subroutines that are called a lot and take up disproportionate amounts of time. This can be useful, but it doesn’t actually give us an overall view of what our code is doing. If we want to do that, we need to move from looking at exclusive to looking at inclusive times, and we do this by adding the -I option to dprofpp. This produces something like this:

 Total Elapsed Time = 9.559537 Seconds

   User+System Time = 5.329537 Seconds

 Inclusive Times

 %Time ExclSec CumulS #Calls sec/call Csec/c  Name

  83.8   0.009  4.468      7   0.0013 0.6383  Email::Store::Mail::store

  80.8   0.061  4.308     35   0.0017 0.1231  Module::Pluggable::Ordered::__ANON

__

  46.3       -  2.472      3        - 0.8239  main::BEGIN

  43.4       -  2.314      7        - 0.3306  Mail::Thread::thread

  43.4       -  2.314      7        - 0.3305  Email::Store::Thread::on_store

  36.2   0.184  1.930  19443   0.0000 0.0001  Class::Accessor::__ANON__

  28.9   0.006  1.543    531   0.0000 0.0029  Email::Store::Thread::Container::_

                                              _ANON__

  27.3   0.068  1.455    105   0.0006 0.0139  UNIVERSAL::require

This tells us a number of useful facts. First, we find that 84% of the program’s runtime is spent in the Email::Store::Mail::store subroutine and its descendants, which is the main, tight loop of the program. This means, quite logically, that 16% is not spent in the main loop, and that’s a good sign — this means that we have a 1-second fixed cost in starting up and loading the appropriate modules, and this will amortize nicely against a longer run than 10 seconds. After all, if processing a massive amount of mail takes 20 minutes, the first 1-second startup becomes insignificant. It means we can pretty much ignore everything outside the main loop.

We also find that threading the emails is costly; threading involves a lot of manipulation of Email::Store::Thread::Container objects, which are database backed. This means that a lot of the database stores and executes that we saw in the previous, exclusive report are probably something to do with threading. After all, we now spend 2 seconds out of our 4 seconds of processing time on threading in Mail::Thread::thread, and even though we only call this seven times, we do 531 things with the container objects. This is bad.

Now, I happen to know (because I wrote the module) that Email::Store::Thread::Container uses a feature of Class::DBI called autoupdate. This means that while we do a lot of fetches and stores that we could conceivably do in memory and commit to the database once we’re done, we instead hit the database every single time.

So, just as an experiment, we do two things to optimize Email::Store::Thread::Container. First, we know that we’re going to be doing a lot of database fetches, sometimes of the same container multiple times, so we cache the fetch. We turn this:

    sub new {

        my ($class, $id) = @_;

        $class->find_or_create({ message => $id });

Into this:

    my %container_cache = ();

    sub new {

        my ($class, $id) = @_;

        $container_cache{$id}

            ||= $class->find_or_create({ message => $id });

This is a standard caching technique, and will produce another tradeoff: we trade memory (in filling up %container_cache with a bunch of objects) for speed (in not having to do as many costly database fetches).

Then we turn autoupdate off, and provide a way of updating the database manually. The reason we wanted to turn off autoupdate is that because all these containers form a tree structure (since they represent mails in a thread which, naturally, form a tree structure), it’s a pain to traverse the tree and update all the containers once we’re done.

However, with this cache in place, we know that we already have a way to get at all the containers in one go: we just look at the values of %container_hash, and there are all the objects we’ve used. So we can now add a flush method:

    sub flush {

        (delete $container_cache{$_})->update for keys %container_cache;

This both empties the cache and updates the database. The only remaining problem is working out where to call flush. If we’re dealing with absolutely thousands of emails, it might be worth calling flush after every store, or else %container_hash will get huge. However, since we’re not, we just call flush in an END block to catch the container objects before they get destroyed by the garbage collector:

    END { Email::Store::Thread::Container->flush; }

Running dprofpp again:

 Total Elapsed Time = 7.741969 Seconds

   User+System Time = 3.911969 Seconds

 Inclusive Times

 %Time ExclSec CumulS #Calls sec/call Csec/c  Name

  65.4       -  2.559      7        - 0.3656  Email::Store::Mail::store

  62.9   0.014  2.461     35   0.0004 0.0703  Module::Pluggable::Ordered::__ANON

__

  56.2   0.020  2.202      3   0.0065 0.7341  main::BEGIN

  31.8   0.028  1.247    105   0.0003 0.0119  UNIVERSAL::require

  29.4   0.004  1.150      7   0.0006 0.1642  Email::Store::Entity::on_store

  22.7   0.025  0.890    100   0.0003 0.0089  Class::DBI::create

  21.0   0.031  0.824    100   0.0003 0.0082  Class::DBI::_create

  18.3   0.235  0.716    109   0.0022 0.0066  base::import

  15.1       -  0.594    274        - 0.0022  DBIx::ContextualFetch::st::execute

  15.1       -  0.592      7        - 0.0846  Mail::Thread::thread

  15.1       -  0.592      7        - 0.0845  Email::Store::Thread::on_store

We find that we’ve managed to shave another second-and-a-half off, and we’ve also swapped a per-mail cost (of updating the threading containers every time) to a once-per-run fixed cost (of updating them all at the end of the run). This has taken the business of threading down from two-and-a-half seconds per run to half a second per run, and it means that 35% of our running time is outside the main loop; again, this will amortize nicely on large runs.

We started with a program that runs for 10 seconds, and now it runs for 4. Through judicious use of the profiler, we’ve identified the hotspots and eliminated the most troublesome ones. We’ve looked at both exclusive and inclusive views of the profiling data, but there are still a few other things that dprofpp can tell us. For instance, the -S option gives us a call tree, showing what gets called from what. These trees can be incredibly long and tedious, but if the two views we’ve already looked at haven’t identified potential trouble spots, then wading through the tree might be your only option.

Writing your Own Profiler

At least, that is, if you want to use dprofpp; until yesterday, that was the only way of reading profiling data. Yesterday, however, I released Devel::DProfPP, which provides an event-driven interface to reading tmon.out files. I intended to use it to write a new version of dprofpp because I find the current profiler intolerably slow; ironically, though, I haven’t profiled it yet.

Anyway, Devel::DProfPP allows you to specify callbacks to be run every time the profiling data shows Perl entering or exiting a subroutine, and provides access to the same timing and call stack information used by dprofpp.

So, for instance, I like visualization of complicated data. I’d prefer to see what’s calling what as a graph that I can print out and pore over, rather than as a listing. So, I pull together Devel::DProfPP and the trusty Graphviz module, and create my own profiler:

 use GraphViz;

 use Devel::DProfPP;

 my $graph = GraphViz->new();

 my %edges = ();

 Devel::DProfPP->new(enter => sub {

     my $pp = shift;

     my @stack = $pp->stack;

     my $to = $stack[-1]->sub_name;

     my $from = @stack > 1 ? $stack[-2]->sub_name : “MAIN BODY”;

     $graph->add_edge($from => $to) unless $edges{$from.” -> “.$to}++;

 })->parse;

 print $graph->as_png;

Every time we enter a subroutine, we look at the call stack so far. We pick the top frame of the stack, and ask for its subroutine name. If there’s another subroutine on the stack, we take that off too; otherwise we’re being called from the main body of the code. Then we add an edge on our graph between the two subroutines, unless we’ve already got one. Finally, we print out the graph as a PNG file for me to print out and stick on the wall.

There are any number of other things you can do with Devel::DProfPP if the ordinary profiler doesn’t suit your needs for some reason; but as we’ve seen, just judicious application of profiling and highlighting hotspots in your code can cut the running time of a long-running Perl program by 50% or so, and can also help you to understand what your code is spending all its time doing.