Drupal migration tips: handlers, field collections, process and preprocess

For advanced Drupal developers, there is more than one way to migrate a web site into Drupal besides copy-pasting content, of course. In this blog post, however, I will concentrate on the Migrate module. Migrate is a powerful module. It's not universal, but it proves very helpful for manageable, scripted migration into Drupal. It is also well documented - you can read about it's usage and best practices in Drupal Migrate documentation. In this post, I will share a number of tips and observations which "I wished I had known" when I first started with it.

1. Handlers

Migrate module is mapping items from specified sources (such as xml file or MySQL table) into the corresponding Drupal's destinations. These can be fields, users, taxonomy terms, or entities like files, field collections, and nodes. While mapping seems quite easy in a given migration class, there is a lot going on under the hood. The mapped data gets applied to the Drupal's destination in a correct way. And these are handlers that tell it how to be correctly applied.

sketch of Druplicon among ancient tools

Out of the box, Migrate module has some necessary basic handlers.

Out of the box, Migrate module has some necessary basic handlers - they are located in the plugins/destinations folder of the module. These hadlers implement destination interfaces, upon which you can also build your own handlers. So out of the box, you have handlers for quite a few basic elements, including file, node, entity, menu, term, user, and comment. Quite a few!

Sometimes, however, these handlers are not enough. For example, when migrating images, file interface is not enough. When migrating youtube video into the Media module, you will need a different url parser than for a Blip.tv type video. Before you start subclassing the file destination class yourself, you may want to check out the Migrate Extras module, that already has a good pack of additional hadlers, inluding the media handlers mentioned above. Even if you don't find your exotic handler there, you can use their handlers as an example for writing your own.

2. Field collections

Field Collection is a very handy module, that allows you to organize collections of fields into separate entities for reuse. But because field collections they are separate entities, you can no longer just import the data into the node's fields. If you try to do so, an error will result, since node itself does not have those fields - they belong to a field collection entity, and it is the field collection entity that belongs to a node. Hence, you have to migrate the fields into field collections, and then field collections you need to migrate into nodes. Brains boiling - but computers can handle.

diagram of field collections

Field collections are entities themselves, which complicates the migration of data into the node's fields.

There is a handler for field collections, but it is not included in any module yet. You will need to get it in the form of a patch from the discussion thread (scroll the thread down to get a newer version). If you look at the code, you will see, that the handler extends the base entity migration handler.

There is an unpleasant limitation to the Field Collection handler. It does not do well with the multi-value field collections. (It should work fine with the single field collections that have multi-value fields, though). This limitation comes from the architecture of the Migrate module itself, it mapping a single source to a single destination. There are work-arounds for that, but they are imperfect, even though can get the job done.

screenshot of Migrate message output

Field Collection handler does not do well with multi-value field collections due to a limitation of the Migrate module's architecture. There are imperfect work-arounds to get the job done.

The work-around for the multi-value field collections is to set a global variable that indicated whether this migration is a multi-value field collection one. Then, the field collection source code needs to be edited at the function import() to read that variable and, if it is set, to create a new field collection into the same node rather than updating it. Make sure your field collection is set to "multi-value".

The drawback of this approach is that you will see the actual result of successbully migrated items only in the Drupal message immediately after the migration, the main table showing the lower amount of items than have actually been processed. If you migrate all, only some (distinct ones) will be shown. This is due to the MySQL mapping handler of the Migration module, which maps only distinct items. There is no work-around for it currently, it is an architectural limitation. If you watch the immediate results, though, you will see how many items have really been imported.

3. Preprocessing and post-processing the data

It is a frequent thing during migrations, that the data in the source differs in format from the data that you should apply to the destination. For instance, your taxonomy terms can be stored in the form of "businesses/shops", indicating the parent and the child terms. Or, your image tags can appear as custom token insertions in content, that you will have to convert into acceptable image markup before migrating. In this case, you need to preprocess data. The function handling pre-processing is prepareRow(stdClass $row). $row is an object, containing the source query results.

Example 1:

function prepareRow($row) {
// Process taxonomy items

In our case, because the categories went in the format of Parent_term/Child_term, we had to create the structure with Drupal API, rather than passing it to the stadnard Migrate module taxonomy term handler. But this is the good of the pre-processing - you can work on the data before it is used by the Migration module.

In some other cases, you may want to insert additional data into the migrated entites, that requires you to know the entity id. For example, you need to create a redirect preserving the old path, or link the new node to another. In this case, you need post-processing. Post-processing is handled by the function complete($entity, stdClass $row). $entity is an object containing the imported entity - file, object, or node, etc., and the $row is an object containing the source query results that was used for migrating into this entity.

Example 2:

function complete($entity, stdClass $row) {
// 301 Redirect URL
$this->save_old_url($entity->nid, $row->url);

Here, we pass the old url from the source table "url" field together with the new entity's id (in this case, it's a node, passing $entity->nid), to a custom function that saves them together in a separate reference field.

As you see, both pre-process and post-process functions are quite straignt-forward and easy to use, and can serve as a powerful tool to complement for the Migrate module limits.

A note to the reader

If you find this post helpful, and have your own observation that "you wished you'd known" about the Migrate module, or if you have a comment or a helpful suggestion to add to this article, add it in the comments! If we add up some good stuff, we can make one more post out of it, attributions due.

We want to work with you!