Friday, December 30, 2016

Disconnect a PHP PDO connection to RDBMS

PHP's PDO connection management is a bit arcade. It does not offer an explicit way to disconnect from database. Here is the PHP's documentation:

http://php.net/manual/en/pdo.connections.php

Inside which it will tell you to set the variable that holds the PDO connection object to null. However the catch is you also need to set all other objects which might hold reference to the PDO object to null as well, and it can be cumbersome as expressed in the comments section of that page. Here is an example: $dbh = new PDO('mysql:host=localhost;dbname=test', $user, $pass); // use the connection here, note the side effect of // a new $sth reference to the PDO $sth = $dbh->query('SELECT * FROM foo'); // and now we're done; close both $sth and $dbh ! $sth = null; // this is not obvious $dbh = null; Enjoy!

Monday, December 12, 2016

dos2unix

The text file I got from a colleague is full of ^M and crashes my parser running on Mac and Linux. This is due to the very unfortunate colleague is still stuck with a Windows machine, and the line break is coded differently on these systems.

I tried to download the dos2unix util for Mac and installed it smoothly, however it does not work for some reason, the ^M's are still there after running it. Bummer! Short of time, I have to roll my own solution. It is yet another one-liner:

$ cat dos2unix #!/bin/bash sed -i 's/^M/\n/g' $1

However the main challenge is to type in ^M character correctly inside the script. It is NOT a literal ^ and a literal M, but rather a single character.

  • In vi, you type: ctrl-v then ctrl-m
  • In emacs, you type: ctrl-q then ctrl-m
ctrl-v means you hold down the ctrl key and v key at the same time.

Enjoy!

[Update 2016-12-21]:

Coming back to the problem, upon further inspection on the problematic file, using $ od -c filename It turns out the line break is only represented as \r, and I guess that's why the regular dos2unix does not work, which expects \r\n. Knowing this, another fix come to mind (without having to deal with typing ^M character): $ sed -i -e 's/\r/\n/g' filename Also my docker infested colleague suggested another general solution for running standard dos2unix when it works, without having to download and install the dos2unix in the hosting environment: $ docker run --rm -it -v `pwd`:/data/ alpine dos2unix /data/filename

This assumes the filename is in the `pwd`, and you have docker installed on the hosting environment. This basically launch a minimal docker machine and use the dos2unix that comes with it. However it won't solve my specific file with non traditional line break.

Enjoy!

Find out the Perl modules installed and their versions

It is actually just a one-liner:
$ whichpm.sh module::name perl_bin_path It prints out all the Perl modules accessible by the particular Perl interpreter, and then the version of the given module.
$ cat whichpm.sh #!/bin/bash echo 'print map { sprintf( "%20s : %s\n", $_, $INC{$_} ) } sort keys %INC; print "\n '$1' version : $'$1'::VERSION\n\n"' | $2 "-M$1" $ ./whichpm.sh LWP::UserAgent /usr/bin/perl Carp.pm : /System/Library/Perl/5.18/Carp.pm LWP.pm : /Library/Perl/5.18/LWP.pm LWP/UserAgent.pm : /Library/Perl/5.18/LWP/UserAgent.pm Storable.pm : /System/Library/Perl/5.18/darwin-thread-multi-2level/Storable.pm Time/Local.pm : /System/Library/Perl/5.18/Time/Local.pm URI.pm : /System/Library/Perl/Extras/5.18/URI.pm strict.pm : /System/Library/Perl/5.18/strict.pm vars.pm : /System/Library/Perl/5.18/vars.pm warnings.pm : /System/Library/Perl/5.18/warnings.pm LWP::UserAgent version : 6.13

The reason for the 2nd argument (perl interpreter) is that there could be multiple Perl interpreters on a system and they tend to have different version and also the supporting modules.

Of course, to make it a reusable script, you will have to add help message, input argument checking, etc.

Enjoy!

Thursday, December 1, 2016

Batch inserts to SQLite

I have a need to insert hundreds of thousands of rows to a SQLite database. Initially I tried to insert one row at a time, but the speed was very slow, even with a prepared statement. Using the multiple-rows insert statement significantly increased the performance.

Here is the PHP code sample:

Option 1 (Slow) $stmt = $dbh->prepare("insert into wordindex (key, value) values (:k, :v)"); foreach ($data as $k => $v) { $stmt->execute(array(':k' => $k, ':v' => $v)); } Option 2 (Fast, up to 100 times)

uses the SQL syntax like the following : insert into table (col1, col2) value (a1, a2), (b1, b2), (c1, c2), ... Here is what the PHP code looks like: $batchSize = 500; $chunks = array_chunk($data, $batchSize, TRUE); foreach($chunks as $chunk) { $sql = "insert into wordindex (key, value) values "; $pairs = array(); foreach($chunk as $k => $v) { $pairs[] = "('$k','$v')"; } $sql .= implode(',',$pairs); $dbh->query($sql); } Since we are not using SQL parameter binding in option 2, we need to pay attention to values for $k and $v, to avoid SQL injection.

Enjoy!