How I learned to stop worrying and love custom migration classes

When I got sick of banging my head against the migration-shaped wall the other day, the state of my attempts to migrate content from Drupal 6 was fairly limited.

Migrate Upgrade was working fairly well, up to a point.

Gallery nodes had been migrated, but without their addresses, which is hardly surprising, seeing as the D6 site was using the Location module, and I've decided to go with Geolocation and Address for the new site.

Exhibition nodes had been migrated, but without the node references to galleries. There's an issue with a patch on drupal.org for this, but after applying the patch, files weren't being migrated.

Time to get stuck in and help fix the patch, I thought. But the trouble is that we're dealing with a moving target. With the release of Drupal 8.1.0, the various migrate modules are all changing as well, and core patches shouldn't be against the 8.0.x branch anymore. It's all too easy to imagine that updating to the latest versions of things will solve all your problems. But often you just end up with a whole new set of problems, and it's hard to figure out where the problem is, in among so much change.

Luckily, by the time I'd done a bit of fiddling with the theme, somebody else had made some progress on the entity reference migration patch, so when I revisited the migrations, having applied the new version of the patch, the exhibitions were being connected to the galleries correctly.

One problem I faced was that the migration would often fail with the error MySQL has gone away - with some help from drupal.org I learned that this wouldn't be so bad if the tables use InnoDB. Converting one of the suggestions into a quick script to update all the Drupal 6 tables really helped, although copying the my.cnf settings killed my MySQL completely for some reason. Yet another reminder to keep backups when you're changing things.

Having read some tutorials, and done some migrations in Drupal 6 and 7, I was trying to tweak the data inside the prepareRow method in my custom migration class. The thing I didn't get for ages was that this method is provided by the Migrate Plus module, so not only did the module have to be enabled, but the migration definition yml file names needed to start with migrate_plus.migration rather than the migrate.migration.

Once I'd made that change, the prepareRow method fired as expected, and from there it was relatively straightforward to get the values out of the old database, even in the more complex migrations like getting location data from another table and splitting it into two fields.

As an example, here's the code of the prepareRow method in the GalleryNode migration class:

/**
 * [email protected]}
 */
public function prepareRow(Row $row) {
  if (parent::prepareRow($row) === FALSE) {
    return FALSE;
  }

  // Make sure that URLs have a protocol.
  $website = $row->getSourceProperty('field_website');
  if (!empty($website)) {
    $url = $website[0]['url'];
    $website[0]['url'] = _gallerymigrations_website_protocol($url);
    $row->setSourceProperty('field_website', $website);
  }

  // Get the location data from the D6 database.
  $nid = $row->getSourceProperty('nid');
  $location = $this->getLocation($nid);

  // Set up latitude and longitude for use with geolocation module.
  $geolocation = $this->prepareGeoLocation($location->latitude, $location->longitude);
  $row->setSourceProperty('field_location', $geolocation);

  $address = $this->prepareAddress($location);
  $row->setSourceProperty('field_address', $address);

  return parent::prepareRow($row);
}

The methods called by this are all fairly similar, with a switch to the D6 database followed by a query - here's an example:

/**
 * Get the location for this node from the D6 database.
 *
 * @param int $nid
 *   The node ID of the gallery.
 *
 * @return Object
 *   The database row for the location.
 */
protected function getLocation($nid) {
  // Switch connection to access the D6 database.
  \Drupal\Core\Database\Database::setActiveConnection('d6');
  $db = \Drupal\Core\Database\Database::getConnection();

  $query = $db->select('location_instance', 'li');
  $query->join('location', 'l', 'l.lid = li.lid');
  $query->condition('nid', $nid);
  $query->fields('l', array(
    'name',
    'street',
    'additional',
    'city',
    'province',
    'postal_code',
    'country',
    'latitude',
    'longitude',
    'source',
  ));

  $result = $query->execute();

  // Revert to the default database connection.
  \Drupal\Core\Database\Database::setActiveConnection();

  $data = array();
  foreach ($result as $row) {
    $data[] = $row;
  }

  // There should be only one row, so return that.
  return $data[0];
}

I get the feeling that if I was following the "proper" object-oriented approach, I'd be doing this using a process plugin, as suggested by this tutorial from Advomatic. But this does the job, and the code doesn't feel all that dirty.

Another lesson I learned the hard way is that when you're adding fields from other sources inside the prepareRow method, you also need to remember to add those fields into the .yml file.

Feeling pleased with myself that I'd managed to migrate the location data, I decided to jump down the rabbit hole of working on integration between Geolocation and Address modules, even though I'd already said I didn't need to do it. Why do developers do that? I can see how difficult a project manager's job can be sometimes. Thankfully, the integration (at least for the needs of this site) can be a fairly shallow and simple job with a few lines of JavaScript, so I've put a patch up for review.

In my day job, I'm a great believer in breaking tasks down as far as possible so that code can be reviewed in small branches and small commits. But when you're working on your own project, it's easy to jump around from task to task as the mood takes you. You can't be bothered with creating branches for every ticket - after all, who's going to review your code?. Half the time, you can't even be bothered creating tickets - you're the product owner, and the backlog is largely in your head.

That butterfly tendency, plus the number of patches I'm applying to core and contributed modules, means that my local site has far more uncommitted change than I'd normally be comfortable with. Using git change lists in PhpStorm has really helped me to cope with the chaos.

On the subject of patches, I've finally got round to trying out Dave Reid's patch tool - it's working really well so far.

This process has reinforced in my mind the value of testing things like migrations on a small sample set. Thankfully, the Drupal 6 version of the Admin Views module lets you bulk delete nodes and taxonomy terms - I couldn't face tweaking the migration while running multiple iterations of importing 3828 terms.

Which reminds me, xdebug is great, but remember to disable it after you've finished with it, otherwise using the site in your VM will be slow, and as Joel Spolsky says, when things run slowly, you get distracted and your productivity suffers. Humans are not good at multitasking, especially when those tasks are complex and unfamiliar.

And when we try to multitask, we don't think straight. I've just spent an hour debugging something that should just work, because the logic in my taxonomy term source plugin was based on a piece of confusion that now seems obvious and stupid. For reference, in Drupal 6, the 'term_node' table connects nodes with the taxonomy terms they're tagged with, and vid refers to the node revision ID, whereas the 'taxonomynode' table connects terms with their related taxonomy node, and vid refers to the vocabulary ID.

The bad news is that the mappings from nodes to taxonomy terms aren't being migrated properly - for some strange reason they're being registered correctly, but all the rows are being ignored.

The good news is that the work in progress is now online for the world to see. For one thing, it's easier to do cross-browser testing that way, rather than faffing around with virtual machines and proxy tunnels and all that sort of nonsense.

So please, have a look, and if you spot any bugs, let me know by creating an issue on the project board.

All tags