Saturday, August 22, 2009

Implementing JSON REST with BLOBs in PHP - The practical way.

I have retrenched, somewhat, (if my memory of what I've been writing serves me correctly.)

I have decided to skip the use of the PUT method until PHP supports it better.

I would like to see full $_GET, $_POST, $_PUT, $_DELETE, $_COOKIE, $_OPTION, $_HEAD, $_REQUEST, etc request body support for the appropriate methods in PHP. From what I've read, this is within the full specs of the HTTP protocol. I'm pursuing it with the PHP internals group. If I can't get it there, I will eventually hack PHP myself or pay someone to do it.

Unfortunately for now, the only way to send anything in the body of a request, ie JSON objects or arrays, is via POST method.

So, since I don't want to waste the time to create my own message body parser for the non POST methods, (since they exist already in the POST method in PHP,) I will only use the following:

[note: if a method is in front of the URL, it is the (ACTUAL METHOD) ]

[note: this does not use 'pretty URLs'. This will be implemented later, using symfony. I will post a new update of what that looks like when I have it working. Self written code or other MVC frameworks can also implement URL rewriting/pretty URLs]

[Note: any error in any of mulitple or single objects causes all objects to be rejected, similarly to a database transaction]
=================================================

----------------------------------------------
GET, i.e. READ of C.R.U.D.
----------------------------------------------
(nothing in body of request, only in URL)
----------------------------------------------
{the collection's intial page}
....http://www.website.tld/collection-name/

{the collection's schema}
....http://www.website.tld/collection-name/?schema=TRUE

{blank object for editing and submitting as new object}
....http://www.website.tld/collection-name/?new=TRUE

{the collection paged}
....http://www.website.tld/collection-name/?start=start_id

{the collection search function}
....http://www.website.tld/collection-name/search/parameter/value/parameter/value etc

{a single JSON entity}
....http://www.website.tld/collection-name/id/?id=xxx





---------------------------------------------
POST, i.e. Create of C.R.U.D. (parameters only in body)
----------------------------------------------
(No id field in URL or allowed in new objects)
----------------------------------------------
{a new object}
....http://www.website.tld/collection-name/
........a single JSON object in POST variable named "JSON[]" or "JSON"
........individual BLOB fields in POST variable named for field
............"blob-fieldA", "blob-fieldB", etc. or "blob-fieldA[]", "blob-fieldB[]"

{a set of new objects}
....http://www.website.tld/collection-name/
........a single JSON object in POST variable named "JSON[]" or "JSON",
............but the contents is a JSON array of JSON objects
........individual BLOB fields per JSON object listed as an adjacent set,
............ sets listed same order as objects
............each in POST variable array named for field
............"blob-fieldA[]", "blob-fieldB[]"

{a set of new objects}
....http://www.website.tld/collection-name/
........multiple JSON objects each in a POST variable named "JSON[]""
........individual BLOB fields per JSON object
............each in POST variable array named for field
............"blob-fieldA[]", "blob-fieldB[]"



----------------------------------------------
PUT, i.e. UPDATE of C.R.U.D (also know as edit)
----------------------------------------------
(parameters only in body, except id field is required
as GET variable in URL for single edited object)
[NOTE: partial updates allowed. Only fields present in
submitted object will be changed. To set a field
capable of being NULL to NULL, set field in
submitted object equal to NULL, no quotes]
----------------------------------------------
{a single edited object}
......(POST)http://www.website.tld/collection-name/id/?id=xxx&_method=PUT
........a single JSON object in POST variable named "JSON[]" or "JSON"
........individual BLOB fields (as desired) in POST variable named for field
............"blob-fieldA", "blob-fieldB", etc. or "blob-fieldA[]", "blob-fieldB[]"

{a set of edited objects}
....(POST)http://www.website.tld/collection-name/?_method=PUT
........a single JSON object in POST variable named "JSON[]" or "JSON",
............but the contents is a JSON array of JSON objects
........individual BLOB fields per JSON object listed as an adjacent set,
............ sets listed same order as objects
............each in POST variable array named for field
............"blob-fieldA[]", "blob-fieldB[]"

{a set of new objects}
....http://www.website.tld/collection-name/?_method=PUT
........multiple JSON objects each in a POST variable named "JSON[]""
........individual BLOB fields per JSON object
............each in POST variable array named for field
............"blob-fieldA[]", "blob-fieldB[]"



----------------------------------------------
DELETE, i.e. DELETE of C.R.U.D
----------------------------------------------
(parameters only in body, except id field is required
as GET variable in URL for single deleted object)
----------------------------------------------
{a single deleted object}
......(DELETE)http://www.website.tld/collection-name/id/?id=xxx

{a single deleted object}
....(POST)http://www.website.tld/collection-name/id/?id=xxx&_method=DELETE

{a set of deleted objects}
....(POST)http://www.website.tld/collection-name/?_method=PUT
........a single JSON object in POST variable named "JSON[]" or "JSON",
............but the contents is a JSON array of JSON Object ids

Sunday, August 2, 2009

2nd of 4 slides


This is the GETting of JSON objects which contain BLOBS using REST. It's really no different than viewing a page that contains a file server cached, remote file to unload the main server that you might visit. Think two things:

Youtube video objects on any web page that has them.
Pornography video objects.

An older more familiar application is web acceleration by Akamai.com. Remember your browser "Waiting for http://www.akamai.com/....."

1st of 4 'Slides' showing REST with BLOBS


As a primer, BLOBS ( Binary Large OBjects ) are anything that is large, and binary. Binary means that it will contain bytes with values of 0x00, and non printing control characters. This messes up string processing software everywhere in the chain of sending anything on the internet unless base32, base64, or mulitipart/mine border transmitted. Types of files that qualify are: images, videos, executables, other system files, 'binary' data files, database backups, and others.

As I have discussed previously, I did not want to send BLOBS using base64 encoding to or from a JSON based server in my current project. They base64 strings take time to encode/decode and use more bandwidth in transmission (the real bottleneck in a web application). base64 encoded BLOBS may be the 'correct' and designed way (per email with the JSON RFC author), but it's not the way I want to do it. So I borrowed from Amazon and others on the web and decided to do a 'Hybrid Server'. Look up the current web statistics and you will see that the Russian made 'nginx' (Engine X) server is taking over from Light HTTP server. The google found links that I found suggest that it is much faster and the 'wave of the future'.

Also, the PUT method of HTTP protocol (used for Update of JSON objects ) does not have any support for multipart encoding, so sending or receiving BLOBS using the PUT method from the user/app and received at the server is a laborious coding issue. Probably slow too, since I code in PHP and don't want to dig into C. My partner's app on the IPhone is in Objective C, so presumably it could be done efficiently on that platform, but why spend the time to do that?

A Hybrid server is a single domain (in this project) or even subdomain that does different fuctions on different machines. In this case, the regular php applications will be running on a standard (but optimised) LAPP machine (Linux/Apache/Postgres/PHP). Instead of storing the BLOB files in string or binary format in the Postgres Database, and requiring all four elements of the machine to process and server the large files, they will be put onto a machine running 'nginx'. The files will be permssion protected by a very simple php script running against the same database as the regular apache machine. Only permissions will be run there. The actual servering will be done by piping files directly to the user out of the filesystem.

One extra possiblity, would be live (then cached) translation of the files. If a file was stored as a video.mov, it could be requested as a video.avi, for example.

This posting on this blog contains the first of four 'slides' on the logic of the protocol. The permission system is left for a later posting. Probably it will be digest authentication. The succeding posts will probably have smaller explanations, each with one 'slide'. The slides will ge in order of 'CRUD'<---->POST/GET/PUT/DELETE. So here is POST:

Saturday, August 1, 2009

PS on dual server REST server

When I can, I will submit a simple line drawing and text description of the transaction on the dual server REST blob application.

Discorvery/Reinventing the Wheel

After much research, I came to the conclusion that JSON is great for
what I want to do except for one thing - binary images. (This research also showed the reason why there are so many 'partial' adoptions of JSON.)

Binary files as fields in JSON objects are especially a problem during a PUT operations. In all the server script languages and in Apache itself, there is access to GET and POST parameters. There is no such mechanism in PUT. One has to write a script to create the boundaries for mulitpart PUTted files in your own application, AND in script for manually dividing up the body of the put based on the boundaries to claim them on the SERVER. In scripting languages, this is very slow.

Of course, it could be sent as base64 encoded strings, which is what
the designer of JSON had in mind, but that is EVEN SLOWER than scanning for boundary markers in a POST style body using a script language. (These are all assumptions on my part, not testing done.)

So my approach is simliar to Amazon's S3 storage server:
http://www.anyexample.com/programming/php/uploading_files_to_amazon_s... and to Yahoo's email attachment option, (UPlaid files for storage and virus scan, then send body of email)

I will have the user POST binary files to a separate SUB domain, using Apache-LIght (a seperate process on a seperate, very fast server), and the server will return a 'bucket', or random file name, (protected by permission headers in between client and host , i.e, header cookies).

The application on a browser or other (IPhone non browser app for
example) Will then populate the fileds of the JSON with the returned
server/filename, as well as the usual binary file related stuff, i.e. title, type, etc.

Applications GETting JSON objects, or PUTting Objects will get or send binary files the same way.

And also, I don't like the semantics of PUTting to originate a JSON
object. So I will be restricting PUT to the Update only of standard
CRUD, (Create, Read, Update, Delete). I don't want users setting
primary key fields.

So, I hope this saves other people some effort that I went through.

The next post, sometime today, will explain what I found out about PUT/DELETE methods in HTTP on Apache @ A2Hosting (I use them because of good support and Postgres databases). To properly use REST, and not use kludgy URLs using both POST and GET values at the same time, (totally doable, but not clean), PUT and DELETE are necessary complements to the standard POST and GET methods.