Blog

Multithreaded script to mirror AWS S3 buckets

This script copies the contents of one Amazon S3 bucket to another, including between accounts (as long as your access key has the right permissions assigned). It uses http://search.cpan.org/~rybskej/forks-0.34/lib/forks.pm instead of threads as the HTTP library seemed to be unstable with a large number of threads (randomly segfaulted with a large number of workers) but works fine with forks. Forks can use a lot of memory unless you have an OS that is using copy-on-write memory allocation.

You should replace the aws_access_key_id and aws_secret_access_key, frombucket and tobucket. Possibly try with a smaller numbers of workers initially until you verify how your OS reacts to a large number of running perl processes. This script doesn’t itself need a lot of bandwidth, and doesn’t need a particularly low latency connection to AWS. It will effectively divide the time to mirror a bucket using a single threaded application by the number of workers. Using 600 workers it reduced the time needed to mirror a large bucket from days to minutes. It will print a . for every file transferred. In general it won’t stop if an error occurs (eg, permission denied), so check afterwards that the full number of files copied correctly.

#!/usr/bin/perl

use warnings;
use strict;

use forks;
use Thread::Queue;

use Data::Dumper;
use Net::Amazon::S3;

$| = 1;

my $s3config = {
  aws_access_key_id     => ‘xxxxxxxxxxxxxxxxxxxx’,
  aws_secret_access_key => ‘xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx’,
  retry                 => 1,
};
my $config = {
  frombucket            => ‘bucket1’,
  tobucket              => ‘bucket2’,
  workers               => 100,
};

my $s3 = Net::Amazon::S3->new($s3config);

my @list;
my $bucketlist = $s3->bucket($config->{frombucket})->list_all or die $s3->err . “: ” . $s3->errstr;
push(@list, $_->{key}) for (@{$bucketlist->{keys}});

my $q = Thread::Queue->new();
my @workers;

for (1..$config->{workers}) {
  push @workers, async {
    my $s3 = Net::Amazon::S3->new($s3config);
    my $bucket = $s3->bucket($config->{tobucket});
    while (defined(my $key = $q->dequeue())) {
      $bucket->copy_key($key, sprintf(‘/%s/%s’, $config->{frombucket}, $key));
      print “.”;
    }
  }
}

$q->enqueue(@list);
$q->enqueue(undef) for @workers;
$_->join for @workers;
/

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment replies are not available offline