Thursday, January 20, 2011

gettimeofday() issues

Get time of day returns the NUMBER of microseconds, not the digits of microseconds (6 of them) past the decimal. So to make it work with BC math, the microseconds needed to be sprintf'd in order to normalize them to 6 digits right justified. Here is the new code for both functions. Now tested for seconds rollover and 32/64bit platforms:


function make_comb_uuid(){
uuid_create(&$v4);
uuid_make($v4, UUID_MAKE_V4);
uuid_export($v4, UUID_FMT_STR, &$v4String);
$var=gettimeofday(FALSE);
return substr($v4String,0,24).substr(bcdechex($var['sec'].
sprintf("%06d", $var['usec'])),0,12);
}

function bcdechex($dec) {
if(PHP_INT_SIZE > 16){
return dechex($dec);
} else {
$last = bcmod($dec, 16);
$remain = bcdiv(bcsub($dec, $last), 16);

if($remain == 0) {
return dechex($last);
} else {
return bcdechex($remain).dechex($last);
}
}
}

Problems with com_uuid functionn on 32 bit systems

The previous post that I wrote was tested on a 64 bit system. It was good,I swear it!

Then I uploaded it to a m1.small amazon server and the right hex digits in the output of the comb_UUID function stuck at 7fffffff. Ahhhh, the joys are overflowing ;-)

So, I got onto the PHP site and found a great little piece of code for arbitrary length DecimalToHex and HexToDecimal code.




Here is what's necessary to make this work. If you wanted to get real fancy, and I will soon, a test for size of integer should be done inside the bcdec2hex program to avoid bcmath functions.

Found a good note on the PHP site. This does it either on 32 or 64 bit system:

(credit for the guy who did it):http://www.php.net/manual/en/ref.bc.php#99130


function bcdechex($dec) {
$last = bcmod($dec, 16);
$remain = bcdiv(bcsub($dec, $last), 16);

if($remain == 0) return dechex($last);
else return bcdechex($remain).dechex($last);
}

function make_comb_uuid(){
uuid_create(&$v4);
uuid_make($v4, UUID_MAKE_V4);
uuid_export($v4, UUID_FMT_STR, &$v4String);
$var=gettimeofday(FALSE);
return substr($v4String,0,24).substr(bcdechex($var['sec'].$var['usec']),0,12);
}

Sunday, January 2, 2011

Using UUIDs for Primary Keys

There are advantages to using UUIDs for primary keys. Look at Amazon and others; They are doinng so. When exposing database ids in URLs, t allows hiding of:



A/ The number of uesrs, items, types etc.
B/ The rate of GROWTH of your site's users, items, types, etc.
C/ The SEQUENCE of ids your sites' user, item and type ids. This slows down the ability to scrape your site's data, especially using your API if you have one.



But UUID/GUID are fairly random, when made right, for certatin types. See the WikiPedia article on them at: http://en.wikipedia.org/wiki/Universally_unique_identifier (ALSO DONATE TO WIKIPEDIA TO KEEP THEM AROUND).. There are 5 types of them, and the FREE specifications are in RFC see Use a search engine and look up "UUID RFC". I believe that it is RFC4122: http://tools.ietf.org/html/rfc4122. That is not one of the easier RFCs to read,(but not as bad as the iCal ones either ;-)



Version 4 is what is needed for the most randomness. Randomness does the best job of preventing guessing and scraping of ids. However, it plays havoc with doing large abount of inserts into databases for new records, since the index pages for the primary key will be randomly accessed also. This works the database REALLY hard. it can be THIRTY times slower to use UUIDs than 4 byte, 32 bit unsigned integer primary, surrogate keys.

See the discussion of that here, with performance tests:

http://www.informit.com/articles/printerfriendly.aspx?p=25862

Since most indexes use the lowest bits of a column vlaue for hashing into index tables, if the lowest bits are close in value, or the same for a number of sequential operations, then the same index table page will remain in memory, virtually erasing any issues with using UUIDs. The author of the previous link came up with a database function that would do that, by substituting the time in microseconds, HEX based, for the last 12 characters of the UUID. The remaining 23 or so characters provide the randomness.

Since I use PHP for most of my work, I needed a PHP function that would do that. The following works well. On my 64 bit, 2.4 GhHz, 4 core, Ubuntu machine at home, just creating these values, writing it to a variable, and then creating a new one and writing it into the same variable, 1x10^06 times only took on average of 3 milliseconds TOTAL for all MILLION. So it's not much of a penalty :-) Even using SSD (Solid State Discs) for the database, that amount of delay for 1 MILLION would be below the noise level.

Here is the code:

/* requires installation of ubuntu 'php5_uuid' module
* see this URL: http://www.informit.com/articles/printerfriendly.aspx?p=25862
* else, use a search engine and look up "comb uuid", or "sequential uuid"
*
* returns a guid with the last 12 chars representing the HEX value of time
* allow better clustering of database index pages and faster performance
* while still using UUIDs
*
*/
function make_comb_uuid(){
uuid_create(&$v4);
uuid_make($v4, UUID_MAKE_V4);
uuid_export($v4, UUID_FMT_STR, &$v4String);
$var=gettimeofday();
return substr($v4String,0,24).substr(dechex($var['sec'].$var['usec']),0,12);

Saturday, December 18, 2010

Printing RFCs

I am on the Linux/Ubuntu platform. 'gedit', 'firefox' (both *.txt and *.html formats) wouldn't print RFCs in a way that put a page of contents on a page of print.

RFCs are set up for dot matrix printer days.

BUT, OpenOffice 'Writer' (and probably Microsoft Word) respect the embedded ASCII page separator and properly print the document. It even worked out of the box with the default (or that chosen for *.txt formats) font type and size.

So if you need a few pages out of an RFC:
Open it in the *.txt format from the IETF website,
Save it to the file system,
Open it in OpenOffice Writer
Open 'Print Preview' and explore to find the pages you want,
Go back to 'Print', slect the pages you want,
Print them!

Thursday, July 15, 2010

Testing Symfony Apps With PHPUnit

[Alt-Title: Or It's NEVER as Easy As the Tutorials Show]

Symfony development cycle is done on a local machine, (probably many other frameworks also). This saves the long upload times, and allows a programmer/developer to work on a project even while on vacation without Internet . . . . uh, right.) Is is much faster. Another major benefit is not interfering with the production site . . . at all, even in subdomain, or a separate database.

Part of that cycle is to drop and recreate the database for every little change, especially when doing testing. It means a clean slate, no interference between tests.

Symfony2 will be swtiching from its own home brewed 'lime' testing framework to PHPUnit according to Dustin Whittle as of a presentation by him at a PHP Meetup 1 week ago. They WILL be extending it to recover what they will lose from lime, gaining both from in house efforts and the community efforts @ the PHP Unit group.

So even though I'm currently using Symfony 1.4.4, I thought I would start my transition to Doctrine 2 by using PHP Unit on my current project (the subject of this blog). Both Symfony 1.4.1 and PHP Unit are current products, and I thought that I wouldn't have to deal with 'first adopter' problems. Sigh, wishful thinking.

Let me preface by saying that I prefer and use Postgresql for the database. It may not have anything to do with the lesson learned and described here, but then maybe it does. I'll leave it for the audience to comment upon using their experiences.

So, running Ubuntu, Apache, Postgres, PHP, Symfony, and Doctrine, I built a moderately complex ERD/Schema. The first lesson?

(LESSON-1)
ALWAYS use substitute, BIG INT, primary keys. Why?
  • They are small compared to strings.
  • If the data changes, the indexes don't have to, (data as primary keys forces the database to recalculate indexes if the data changes.)
  • Symfony 1.x.x, and even Symfony 2/Doctrine 2 (according to a page at http://www.doctrine-project.com/ that I read) will not yet support a sequence with a muliti column, composite key. Just using a muliti column key would probably make the Doctrine ORM work harder and give it more chances to not do exactly what you had in mind

Using a substitute primary key, means that uniqueness on the DATA column potentially used as a primary key must then be enforced using an additional index on the/those column/s. So maybe the extra writing to the index still goes on. My bad. If you really want to learn about the 'Big Fight' surrounding substitute integer primary keys, use your favorite search engine to read about it. But it REALLY is better with web frameworks, trust me.

Having got past the Symfony/Doctrine framework not being able to create a model on composite keys involving a sequence, I have made great progress. In fact, I'm to the testing point on the main datagram/ERD/Schema in relationship to user input, searching, and output pages. Yehaww.

So, back to PHP Unit/Symfony. I set up an inheriting class of the main PHP Unit class 'PHPUnit_Framework_TestCase'. All methods that begin with 'test...' are executed when the class is executed by a CLI executable of PHPUnit, or when the execute() method of the class is called. They SEEM to be in the order that they are defined in the class, but I've only written two methods so far.

[LESSON-2]
Use exception test everywhere in PHPUnit derived class while doing PHPUnit/Symfony testing.
[LESSON-3]
Build fixtures (preset conditions) and tests for them INCREMENTALLY.

One reason is, the line numbers that PHPUnit feeds forward to the screen with errors is the line IT called, not where the error is. And the error stack is not that accurate either. And incremental building/testing (should be done together) lets you know, MOST of the time, where the error is.

Other reasons, inline echos come out of PHPUnit out of order from its message and its output. PHPUnit also seems to NOT output PHP errors, or stop at 'exit()', but to keep going past them. So it's better to build small pieces, build small steps,blah blah.

Did I do that at first? NO! That's why I can tell you it's easier doing so now :-)

NOW . . . The main lesson in this post.

[LESSON-4]
It's *NOT* a good idea to drop the database for every test. Reasons?

  • It SEEMS as if the make a database code in Postgres or Doctrine returns before it's actually done. In fact, as I read this, I seem to remember having read somewhere that dropping/creating the database is NOT transactionable. I can tell you from personal experience, that is SEEEEEEMS that way for me. I would get errors about columns not existing, or creating tables would fail, or foreign key violations, or other errors that didn't make sense. The datasets I'm using are tiny, and I KNOW that there's no chance of foreign key violations, and which columns exist or don't.
  • It takes a LONG time to drop a database and recreate one. Something like 7 seconds on my machine at home. If I am running 25-100 fixtures with several assertions each, that would be a long time, counting the data tests that will happen when the tests get much larger towards the end.
  • It takes a long time to ADD test data if it's any size at all, but this also affects my eventual solution - partially.
What do do otherwise? Delete the data in the tables, reset sequences/autoincrementing columns, reload data. The database stay sin existence. Several things have to be taken into account though.

Drop the tables in the order that it does not cause any foreign key violations, from lowest 'grandchild' table to the highest parent. IF your data is too complex for that, drop all the constraints, then recreate them before repopulating the test data. If you have large datasets, and your database supports it, TRUNCATE the tables. It essentially just erases the contents of the database files but without deleting the files. Finally, if your datasets are large, or you've run many tests on the database already, force your database to scan the tables and consolidate the records to the top of the table files in a smaller file. Different DBs call it different things. If you do TRUNCATE, you may not need to do the scan.

What did I try with Symfony/Doctrine/PHPUnit *BEFORE* resorting to doing something more conventinal/pre framework era-like:

1/ Before each PHPUnit test, I ran the command line reset of the project using one of PHP's host command line invocation functions, exec(). I did it like this:

Executing it from outside of PHP while inside of the Program.
  //the following was an attempt to let Postgres drop a database.

// Postgres will not do so while there are connections to a database.
Doctrine_Manager::getInstance()->getCurrentConnection()->close();

exec('./symfony doctrine:build --all --and-load --no-confirmation');

This drops the database, recreates it, recreates the tables, then the constraints, then loads the data, and does not ask for confirmtatio of anything. Using that on the command line (not from within a program) is the normal command line, user land approach. This did NOT avoid any of the timing issues ,and always resulted with strange errors.

1/ Before each PHPUnit test, I ran the Symfony task from WITHIN symfony by doing this:

using a Task from inside of symfony:
  $optionsArray=array();

$argumentsArray=array();

$optionsArray[]="--all";

$optionsArray[]="--and-load";

$optionsArray[]="--no-confirmation";

$task = new sfDoctrineBuildTask($configuration->getEventDispatcher(),

new sfFormatter());

$task->run($argumentsArray, $optionsArray);


This had slightly different errors . . . . sometimes.


SOOOOOOOooooooo, since I've been in construction, I less frequently site there and try and figure out what the right thing is supposed to do, or what the designers were really trying to get us to do. I JUST GET IT WORKING. I knew that doing it the straight SQL way would work, so that's what I did.


[LESSON-5]Doctrine's 'rawSqlblah' functions aren't really 'raw' or that transparent.


BUT, I did find a post that showed how to do it, symfony framework forum: General discussion => [HOWTO] True RAW SQL in Doctrine. The thing is to avoid mixing PDO, or even lower level code withi Symfony/Docrine code because you can REALLY screw up the multi level transaction code inside of Doctrine. And, it STILL is simpler, even using raw SQL vs ORM, to use the Doctrine ORM for the connection. Your code has to be database specific in some cases, though. Here's how to do it for Postgres:

//multiple statements not possible in prepared queries (used by default)

$sql=array();

// removes every record,if the constraints allow

$sql[]="delete from table_one";

//sets it back to starting value, usually 1.

$sql[]="alter sequence table_one_id_seq restart";

// removes every record,if the constraints allow

$sql[]="delete from table_two";

//sets it back to starting value, usually 1.

$sql[]="alter sequence table_two_id_seq restart";

$doctrine = Doctrine_Manager::getInstance()->getCurrentConnection()->getDbh();

foreach( $sql as $do ){
$doctrine->query($do);
}

//Still possible, and EASIER to load the (default) fixture file using external commands:

//Symfony command line commands must always be done from project root

chdir(sfConfig::get('sf_root_dir'));
exec('./symfony doctrine:data-load');

So, this post talked about 5 lessons about using Symfony with PHPUnit that YOU don't have to learn the hard way if you don't want to:

[Alt-Title: Or It's NEVER as Easy As the Tutorials Show]
(LESSON-1)
ALWAYS use substitute, BIG INT, primary keys. Why?
[LESSON-2]
Use exception test everywhere in PHPUnit derived class while doing PHPUnit/Symfony testing.
[LESSON-3]
Build fixtures (preset conditions) and tests for them INCREMENTALLY.
[LESSON-4]
It's *NOT* a good idea to drop the database for every test.
[LESSON-5]
Doctrine's 'rawSqlblah' functions aren't really 'raw' or that transparent. See this link about the subject and better way to do it: symfony framework forum: General discussion => [HOWTO] True RAW SQL in Doctrine


Site references:
http://www.phpunit.de/
http://www.symfony-project.com/
http://www.doctrine-project.com/

[NOTES]
1/ So why was I trying to use a sequence/autoincrementing column in combination with the primary key columns? There were/are 8 integer columns that are fed from 4 other tables that need to be unique. To use THOSE as the foreign keys in the the table that is a child of that table, I really needed to make those have mulitple occurrences in that child table. BUT I wanted to be able to search for it easier and have LOTS easier database code to write, especially if I eventually go to C++ for somethings for speed. (Please, it does happen :-) There's more to it that that, proprietary to the site, that I can't divulge.

Wednesday, June 16, 2010

SUCCESS at doing Bulk operations in Symfony

Well, after fighting my way through many pages, and many object methods that SEEMED like they would give me the opportunity to enclose a bulk load operation in a transaction, I finally found out how to do it.

DON'T get the PDO object from the Doctrine Library. Using it directly prevents Doctrine's 'Transaction' tree from keeping track of things. The PDO object might also not be connected yet depending on your context.

A good site for better laid out, and maybe better filled out documetation on Doctrine is: http://www.tig12.net/downloads/apidocs/symfony/lib/plugins/sfDoctrinePlugin/lib/vendor/doctrine/Doctrine/Doctrine_Manager.class.html#det_methods_connection

(That's a specific link to a specific item. I'll leave it as a reader's exercise to find their way to the home page of the site ;-) )

Anyway, reading through the documentation on that site, and examples of transactions for Symfony or Doctrine, I FINALLY found the crucial details.

A/ Use an instance of Doctrine_Connection to handle transactions external to the normal ORM statements that might be done for bulk loading.

B/ HOW TO GET THAT Doctrine_Connection instance while in a connected state.

And, drum roll please,this is how you do it:

$connection=Doctrine_Manager::connection(); // Doctrine_Manager is static.
//read link above to get info on how it automatically connects

$connection->beginTransaction();

//do bulk load stuff here, orrrrrrrrr
//roll your own save around all the elements of a complex schema save.
//For example.
//Books<->Authors database
//Authors can have multiple books
//Books can have multiple authors
//Three tables/objects in database
//Books, BookAuthors, Authors.
//To get them to save all at one

$book=new Book();
$book['author']='george';
$book['author']='dude';
$book->save();

//$book now has two authors.
//Internal code to books (and/or authors) that you have to write collects all the
//authors for books, or books for authors, and saves to all three tables, in between
// 'beginTransaction()' like above, and a 'commitTransaction()', like below.

$connection->beginTransaction();

Simplistic, and effective, but not complete. To do it right, you would put transaction inside of Book->save() method inside a try/catch block to catch database exceptions. See http://www.php.net/try and read all relevant links for 'catch', 'Exception', 'throw'

How this helps you.

EXTRA TIP.

Use arrays for data input and output as much as possible to save overhead

Set all variables to NULL at end of loops to help memory management and garbage collection in PHP.

Monday, June 14, 2010

Using the database in Tasks in Symfony

The latest Symfony (1.4.1 as of 2010-06-13, using doctrine 1.2) has a pretty good task skeleton generator.

But first, the definition of a task. For Symfony, a task is something that can be executed using PHP CLI (Command Line Interpreter). This is also something that can be executed from a cron script (a good thing to know).

The key part is, that the skeleton Symfony creates uses the whole Symfony environment, including the Doctrine ORM, filters, all sorts of things. However, the skeleton, like a LOT of things in Symfony, is not well documented. Or, it's documented in only a tutorial. Hence, this blog article on the database connection.

Here is the code that my use of symfony generated:

The command line issued while in the project directory
~$ ./symfony generate:task taskman
>> task Creating "/home/project_dir/lib/ta...taskmanTask.class.php" task file
~$

The resultant file in /home/project_dir/lib/task/taskmanTask.php
<?php

class taskmanTask extends sfBaseTask
{
protected function configure()
{
// // add your own arguments here
// $this->addArguments(array(
// new sfCommandArgument('my_arg', sfCommandArgument::REQUIRED, 'My argument'),
// ));

$this->addOptions(array(
new sfCommandOption('application', null, sfCommandOption::PARAMETER_REQUIRED, 'The application name'),
new sfCommandOption('env', null, sfCommandOption::PARAMETER_REQUIRED, 'The environment', 'dev'),
new sfCommandOption('connection', null, sfCommandOption::PARAMETER_REQUIRED, 'The connection name', 'doctrine'),
// add your own options here
));

$this->namespace = '';
$this->name = 'taskman';
$this->briefDescription = '';
$this->detailedDescription = <<<EOF
The [taskman|INFO] task does things.
Call it with:

[php symfony taskman|INFO]
EOF;
}

protected function execute($arguments = array(), $options = array())
{
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();

// add your code here
}
}
Notice these excerpted lines:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();

The big deal is that it happens in the context of the task class, inheriting the BaseTask class. So that's where the '$this->configuration' argument comes from. This configuration contains the database connection details for /home/project_dir/config/databases.yml, among other things.

So the variable $databaseManager, and instance of sfDatabaseManager, already has set up in it all your connections defined in databases.yml. 'You should know that', right? ;-) You should also know how the dbases in your databases are named, right? Yeah, I thought not.

It's not really apparent from the code generated byt $options['connection'] is how you feed in the name of the database connection as defined in databases.yml. Here is my simple,for now, databases.yml (mangled to remove important details):

# You can find more information about this file on the symfony website:
# http://www.symfony-project.org/reference/1_4/en/07-Databases

all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: pgsql:host=localhost;dbname=dbname
username: semi_administrative_name
password: separate_password_for_each_connection_username

I don't remember how I generated this yml file, I think it was automatic as a Doctrine one upon project generation. Anyway, 'doctrine' is the name of the connection.

There are two ways you could feed this into the statement:
'WAY ONE'
$connection = $databaseManager->getDatabase('doctrine')->getConnection();

'WAY TWO'
feed the name 'doctrine' to the task on the command line as an option, like:
{from your project directory}
./symfony namespace:taskmanTask.php --connection=doctrine
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();


Now, three other things to consider to round out using the databases in Symfony tasks:

A/ You can investigate using sfOrmTask as the base class for your task, like:

class taskmanTask extends sfDoctrineTask
vs
class taskmanTask extends sfBaseTask
Some hints on how that might be useful are here:
http://librosweb.es/symfony_1_2_en/capitulo16/using_symfony_outside_of_a_web_context.html

B/ You need to declare a namespace inside of the generated task file. Symfony will automatically scan all tasks and cache them and therefore will know the namespace declared inside the file. For example:

change:
$this->namespace = '';
to:
$this->namespace = 'cron'; // I use this for my cron running scripts

C/ I forgot, what it was, if I remember, I'll edit this.

Anyway, passing this, probably incomplete, info on using databases in tasks.

PS SOMEDAY I HAVE TO LEARN HOW TO KEEP INDENTED LINES IN CODE HERE. The .yml file has various indentions. See your own file.