Thursday, January 20, 2011

gettimeofday() issues

Get time of day returns the NUMBER of microseconds, not the digits of microseconds (6 of them) past the decimal. So to make it work with BC math, the microseconds needed to be sprintf'd in order to normalize them to 6 digits right justified. Here is the new code for both functions. Now tested for seconds rollover and 32/64bit platforms:


function make_comb_uuid(){
uuid_create(&$v4);
uuid_make($v4, UUID_MAKE_V4);
uuid_export($v4, UUID_FMT_STR, &$v4String);
$var=gettimeofday(FALSE);
return substr($v4String,0,24).substr(bcdechex($var['sec'].
sprintf("%06d", $var['usec'])),0,12);
}

function bcdechex($dec) {
if(PHP_INT_SIZE > 16){
return dechex($dec);
} else {
$last = bcmod($dec, 16);
$remain = bcdiv(bcsub($dec, $last), 16);

if($remain == 0) {
return dechex($last);
} else {
return bcdechex($remain).dechex($last);
}
}
}

Problems with com_uuid functionn on 32 bit systems

The previous post that I wrote was tested on a 64 bit system. It was good,I swear it!

Then I uploaded it to a m1.small amazon server and the right hex digits in the output of the comb_UUID function stuck at 7fffffff. Ahhhh, the joys are overflowing ;-)

So, I got onto the PHP site and found a great little piece of code for arbitrary length DecimalToHex and HexToDecimal code.




Here is what's necessary to make this work. If you wanted to get real fancy, and I will soon, a test for size of integer should be done inside the bcdec2hex program to avoid bcmath functions.

Found a good note on the PHP site. This does it either on 32 or 64 bit system:

(credit for the guy who did it):http://www.php.net/manual/en/ref.bc.php#99130


function bcdechex($dec) {
$last = bcmod($dec, 16);
$remain = bcdiv(bcsub($dec, $last), 16);

if($remain == 0) return dechex($last);
else return bcdechex($remain).dechex($last);
}

function make_comb_uuid(){
uuid_create(&$v4);
uuid_make($v4, UUID_MAKE_V4);
uuid_export($v4, UUID_FMT_STR, &$v4String);
$var=gettimeofday(FALSE);
return substr($v4String,0,24).substr(bcdechex($var['sec'].$var['usec']),0,12);
}

Sunday, January 2, 2011

Using UUIDs for Primary Keys

There are advantages to using UUIDs for primary keys. Look at Amazon and others; They are doinng so. When exposing database ids in URLs, t allows hiding of:



A/ The number of uesrs, items, types etc.
B/ The rate of GROWTH of your site's users, items, types, etc.
C/ The SEQUENCE of ids your sites' user, item and type ids. This slows down the ability to scrape your site's data, especially using your API if you have one.



But UUID/GUID are fairly random, when made right, for certatin types. See the WikiPedia article on them at: http://en.wikipedia.org/wiki/Universally_unique_identifier (ALSO DONATE TO WIKIPEDIA TO KEEP THEM AROUND).. There are 5 types of them, and the FREE specifications are in RFC see Use a search engine and look up "UUID RFC". I believe that it is RFC4122: http://tools.ietf.org/html/rfc4122. That is not one of the easier RFCs to read,(but not as bad as the iCal ones either ;-)



Version 4 is what is needed for the most randomness. Randomness does the best job of preventing guessing and scraping of ids. However, it plays havoc with doing large abount of inserts into databases for new records, since the index pages for the primary key will be randomly accessed also. This works the database REALLY hard. it can be THIRTY times slower to use UUIDs than 4 byte, 32 bit unsigned integer primary, surrogate keys.

See the discussion of that here, with performance tests:

http://www.informit.com/articles/printerfriendly.aspx?p=25862

Since most indexes use the lowest bits of a column vlaue for hashing into index tables, if the lowest bits are close in value, or the same for a number of sequential operations, then the same index table page will remain in memory, virtually erasing any issues with using UUIDs. The author of the previous link came up with a database function that would do that, by substituting the time in microseconds, HEX based, for the last 12 characters of the UUID. The remaining 23 or so characters provide the randomness.

Since I use PHP for most of my work, I needed a PHP function that would do that. The following works well. On my 64 bit, 2.4 GhHz, 4 core, Ubuntu machine at home, just creating these values, writing it to a variable, and then creating a new one and writing it into the same variable, 1x10^06 times only took on average of 3 milliseconds TOTAL for all MILLION. So it's not much of a penalty :-) Even using SSD (Solid State Discs) for the database, that amount of delay for 1 MILLION would be below the noise level.

Here is the code:

/* requires installation of ubuntu 'php5_uuid' module
* see this URL: http://www.informit.com/articles/printerfriendly.aspx?p=25862
* else, use a search engine and look up "comb uuid", or "sequential uuid"
*
* returns a guid with the last 12 chars representing the HEX value of time
* allow better clustering of database index pages and faster performance
* while still using UUIDs
*
*/
function make_comb_uuid(){
uuid_create(&$v4);
uuid_make($v4, UUID_MAKE_V4);
uuid_export($v4, UUID_FMT_STR, &$v4String);
$var=gettimeofday();
return substr($v4String,0,24).substr(dechex($var['sec'].$var['usec']),0,12);