Wednesday, June 16, 2010

SUCCESS at doing Bulk operations in Symfony

Well, after fighting my way through many pages, and many object methods that SEEMED like they would give me the opportunity to enclose a bulk load operation in a transaction, I finally found out how to do it.

DON'T get the PDO object from the Doctrine Library. Using it directly prevents Doctrine's 'Transaction' tree from keeping track of things. The PDO object might also not be connected yet depending on your context.

A good site for better laid out, and maybe better filled out documetation on Doctrine is: http://www.tig12.net/downloads/apidocs/symfony/lib/plugins/sfDoctrinePlugin/lib/vendor/doctrine/Doctrine/Doctrine_Manager.class.html#det_methods_connection

(That's a specific link to a specific item. I'll leave it as a reader's exercise to find their way to the home page of the site ;-) )

Anyway, reading through the documentation on that site, and examples of transactions for Symfony or Doctrine, I FINALLY found the crucial details.

A/ Use an instance of Doctrine_Connection to handle transactions external to the normal ORM statements that might be done for bulk loading.

B/ HOW TO GET THAT Doctrine_Connection instance while in a connected state.

And, drum roll please,this is how you do it:

$connection=Doctrine_Manager::connection(); // Doctrine_Manager is static.
//read link above to get info on how it automatically connects

$connection->beginTransaction();

//do bulk load stuff here, orrrrrrrrr
//roll your own save around all the elements of a complex schema save.
//For example.
//Books<->Authors database
//Authors can have multiple books
//Books can have multiple authors
//Three tables/objects in database
//Books, BookAuthors, Authors.
//To get them to save all at one

$book=new Book();
$book['author']='george';
$book['author']='dude';
$book->save();

//$book now has two authors.
//Internal code to books (and/or authors) that you have to write collects all the
//authors for books, or books for authors, and saves to all three tables, in between
// 'beginTransaction()' like above, and a 'commitTransaction()', like below.

$connection->beginTransaction();

Simplistic, and effective, but not complete. To do it right, you would put transaction inside of Book->save() method inside a try/catch block to catch database exceptions. See http://www.php.net/try and read all relevant links for 'catch', 'Exception', 'throw'

How this helps you.

EXTRA TIP.

Use arrays for data input and output as much as possible to save overhead

Set all variables to NULL at end of loops to help memory management and garbage collection in PHP.

Monday, June 14, 2010

Using the database in Tasks in Symfony

The latest Symfony (1.4.1 as of 2010-06-13, using doctrine 1.2) has a pretty good task skeleton generator.

But first, the definition of a task. For Symfony, a task is something that can be executed using PHP CLI (Command Line Interpreter). This is also something that can be executed from a cron script (a good thing to know).

The key part is, that the skeleton Symfony creates uses the whole Symfony environment, including the Doctrine ORM, filters, all sorts of things. However, the skeleton, like a LOT of things in Symfony, is not well documented. Or, it's documented in only a tutorial. Hence, this blog article on the database connection.

Here is the code that my use of symfony generated:

The command line issued while in the project directory
~$ ./symfony generate:task taskman
>> task Creating "/home/project_dir/lib/ta...taskmanTask.class.php" task file
~$

The resultant file in /home/project_dir/lib/task/taskmanTask.php
<?php

class taskmanTask extends sfBaseTask
{
protected function configure()
{
// // add your own arguments here
// $this->addArguments(array(
// new sfCommandArgument('my_arg', sfCommandArgument::REQUIRED, 'My argument'),
// ));

$this->addOptions(array(
new sfCommandOption('application', null, sfCommandOption::PARAMETER_REQUIRED, 'The application name'),
new sfCommandOption('env', null, sfCommandOption::PARAMETER_REQUIRED, 'The environment', 'dev'),
new sfCommandOption('connection', null, sfCommandOption::PARAMETER_REQUIRED, 'The connection name', 'doctrine'),
// add your own options here
));

$this->namespace = '';
$this->name = 'taskman';
$this->briefDescription = '';
$this->detailedDescription = <<<EOF
The [taskman|INFO] task does things.
Call it with:

[php symfony taskman|INFO]
EOF;
}

protected function execute($arguments = array(), $options = array())
{
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();

// add your code here
}
}
Notice these excerpted lines:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();

The big deal is that it happens in the context of the task class, inheriting the BaseTask class. So that's where the '$this->configuration' argument comes from. This configuration contains the database connection details for /home/project_dir/config/databases.yml, among other things.

So the variable $databaseManager, and instance of sfDatabaseManager, already has set up in it all your connections defined in databases.yml. 'You should know that', right? ;-) You should also know how the dbases in your databases are named, right? Yeah, I thought not.

It's not really apparent from the code generated byt $options['connection'] is how you feed in the name of the database connection as defined in databases.yml. Here is my simple,for now, databases.yml (mangled to remove important details):

# You can find more information about this file on the symfony website:
# http://www.symfony-project.org/reference/1_4/en/07-Databases

all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: pgsql:host=localhost;dbname=dbname
username: semi_administrative_name
password: separate_password_for_each_connection_username

I don't remember how I generated this yml file, I think it was automatic as a Doctrine one upon project generation. Anyway, 'doctrine' is the name of the connection.

There are two ways you could feed this into the statement:
'WAY ONE'
$connection = $databaseManager->getDatabase('doctrine')->getConnection();

'WAY TWO'
feed the name 'doctrine' to the task on the command line as an option, like:
{from your project directory}
./symfony namespace:taskmanTask.php --connection=doctrine
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();


Now, three other things to consider to round out using the databases in Symfony tasks:

A/ You can investigate using sfOrmTask as the base class for your task, like:

class taskmanTask extends sfDoctrineTask
vs
class taskmanTask extends sfBaseTask
Some hints on how that might be useful are here:
http://librosweb.es/symfony_1_2_en/capitulo16/using_symfony_outside_of_a_web_context.html

B/ You need to declare a namespace inside of the generated task file. Symfony will automatically scan all tasks and cache them and therefore will know the namespace declared inside the file. For example:

change:
$this->namespace = '';
to:
$this->namespace = 'cron'; // I use this for my cron running scripts

C/ I forgot, what it was, if I remember, I'll edit this.

Anyway, passing this, probably incomplete, info on using databases in tasks.

PS SOMEDAY I HAVE TO LEARN HOW TO KEEP INDENTED LINES IN CODE HERE. The .yml file has various indentions. See your own file.