I wrote this little PHP script to mirror a directory structure from a webserver to Amazon S3. This can for example be used for backups or – that’s what I use it for – putting files into a CloudFront enabled bucket.
To use this you need an account at Amazon AWS. I’ll assume that you already have that.
The script uses the Amazon S3 Class for PHP by Donovan Schönknecht to communicate with Amazon.
Download
Version 0.1: s3-mirror-for-php-0.1.zip
Instructions
Unpack the files from the ZIP to your server. The zip contains an empty directory bucket-source which is the default source for the mirroring process. You can change that to any other directory if you want.
Open the file „mirror.php“ in your favourite editor and find the „Config“ section near the top of the file.
$accessKey = 'YOUR-ACCESS-KEY';
$secretKey = 'YOUR-SECRET-KEY';
$bucketName = 'YOUR-BUCKET-NAME';
Enter your Amazon AWS details here.
$sourceDir = 'bucket-source/';
This is the local (on your webserver) directory that is used as mirroring source.
$cacheDuration = 3600 * 24 * 30;
This setting is important if you want to use CloudForce as a content delivery network. Each file uploaded to Amazon will get a „far future Expires header“ to speed up loading of your pages for returning visitors and to reduce traffic costs for you. Files will only be retreived once within the given time period (in seconds). Default is 30 days.
$fileAcl = S3::ACL_PUBLIC_READ;
This is the ACL level that is applied to newly uploaded files. S3::ACL_PUBLIC_READ is the one you’ll need for CloudForce usage. To keep your files private (for example as a simple backup) use S3::ACL_PRIVATE.
Pitfalls
Before we execute the script for the first time let me tell you about some „pitfalls“ that you might not consider. Just to avoid confusion and loss of data:
- This function mirrors the source directory to S3. That also includes DELETING all files that are in the S3 bucket, but not in the source directory. Even data that you might have put there earlier and may want to keep. To avoid any problems, use a new, empty bucket for your work or download copies of all the files you want to keep to your source directory!
- ACL rules and Expires headers are only applied to newly uploaded files. Existing files will not be changed. So if you decide to increase the cache period to 60 days, but already have some files with 30 days – these files will remain at 30 days.
Run the script
After configuration is done, all you need is to start the script mirror.php in your browser. Note that error handling is at a very low level.
Hi Dominik,
Thanks for the script. It’s very useful for us.
I found that for getBucket($bucketname, null, null, 100000), it’s better to omit the maxKeys argument as null instead of assign it to 10000. Using maxKeys it will not do the recursive bucket info retrieval and will return only around 1000 records. So: getBucket($bucketname) should do.
this only works for 2000 files then stops. 🙁
That might be possible. Amazon returns the list of files stored in S3 in „paged“ lists, probably with 2000 files per page. I’ll have to check that.
Great script! Can be easily combined with other backup programs.
just an FYI to anyone looking to use this script. You should get the latest s3.php file from github, you can find it in this folder: tpyo/amazon-s3-php-class. If you don’t mirror.php will throw folder errors.
Shilling for my project — You might also check out s3s3mirror. It’s an open source command line tool that mirrors buckets (and much more) very, very quickly: https://github.com/cobbzilla/s3s3mirror