So I mentioned before about a WordPress plugin I've written for a friend. Turns out, she's moved the target a bit, so the plugin isn't really what she needs any longer, but it's still a nice bit of work (IMHO), so I went ahead and submitted it to the WordPress community and created a site for it. It's called WP Simple Seller and I'm trying to stay true to the name. There are a few e-commerce plugins already available, all of which do more than I needed and fail to do at least one thing I needed. Of course, the basics are that you should put an item for sale on your blog, and subsequently sell it. Duh. But the problem spec called for there to be multiple sellers, each with individual PayPal (or whatever) accounts. And, the blog owner should not have to get involved in the sale process at all, other than approving the seller to sell on the site. No current plugin had that capability, that I could find. So that's what I set out to build, and I'm pretty happy with the results. There ARE a few steps to getting it all set up, but that's to be expected. Of course, my first release had a bug which probably threw a few people off, so earlier today I fixed the bug and updated the plugin to version 1.1, which should be available on the WordPress plugins site by now.
I released version 0.4 of NArchivist, my cloud-based backup/archiving solution, about a week ago. I should have made a blog entry about it then, but I've been tied up with other things. Plus, I wasn't entirely sure I wasn't going to have to make another quick-fix release. This is largely because I gutted the encryption sub-system to satisfy two requirements. First, it needed to support streaming of very large files to and from the data stores, and second, it had to work on my older CentOS5 servers. Version 0.3 was built with a tool called m2secret, which I used to encrypt the files I was backing up. Ok, no worries, except I found out the hard way that m2secret needed Python 2.5 and, alas, my CentOS5 servers only had Python 2.4. Upgrading wasn't really a good option. Plus, m2secret operates by reading the entire file into memory, then encrypting it, then sending it to the target. Fine, for small(-ish) files, but for larger files this could be a real problem. I knew this going in, but wanted a quick solution that I could fix later. Bumping into the Python 2.4/2.5 problem made it later. So I fixed it. This has pushed back my Windows support a bit, and my creation of a web-based interface for the system. But that'll be along fairly soon. No choice, really; I need the thing myself.
Well, after a bunch of non-NArchivist work, I finally got back to my current pet project (though I'm really itching to get back to my REAL pet project, the TurtolCMS), and now NArchivist, my backup software, will not only back files up, but it can actually RESTORE them as well. One would hope that would be a standard feature. I've just "released" version 0.3 on SourceForge. New features include: - A basic desktop client from which you can select files to restore. Of course, any file you would overwrite via a restore is first backed up (if necessary).
- RackSpace CloudFiles support, so now you can store your backups to two truly distinct clouds, thus making your data a bit more secure.
- After every backup session it backs up its own database. You might think this would only be needed after a backup session in which files were uploaded, but since we also record "last-seen" data to catch file deletions, the database goes up every time. Versioned, of course.
Next, I'll have to make the website a bit more complete. Which is to say complete at all. After that, I'll try to get it running on Window, and flesh out the web-based interface I started.
I've just put up the first "version" of my new backup software, NArchivist, on SourceForge. It's not available as a regular download just yet, as there's too much to do to make it ready for a casual install (even tho "casual" by SourceForge standards usually means pretty technical). So you can get it from the source repository. See the project summary page for the gritty-kitty details, or you can start with my lame-o-rama project home page if you're in for a larf. Stuff It Does- It takes files from one or more "locations" on your filesystem(s), and copies them and their metadata (stuff like ownership, permissions, etc) to one or more "targets" in the cloud. For now, I only support Amazon S3, so if you want more than one target, you need to either create more than one s3 bucket or more than one AWS/s3 account. Or you can add support for other storage targets. Go ahead, it'll be fun.
- When a file changes, it backs it up again, and continue to until it reaches a configurable minimum number of copies. Basic versioning.
- File names are obscured. Actually, they're mapped via database to other, cryptographically-generated name. Session names are not obscured, however.
- It keeps a database of each of these operations, so we know where all our stuff is.
- There is a basic, text-mode installation/configuration program. This doesn't install the software, per-se. That'll be next. This just allows you to bootstrap the thing, create the database, and manage the target, location and backup parameters. Very rudimentary.
Stuff It Doesn't Do But WillIn no particular order: - Sessions ("runs" of the backup routine) are named and kept separate on the targets, but the files from all locations are intermingled within the session. I'd like to change this. Ok, truth be told, since I'm using cloud storage, I really don't know how things are "mingled" at all. It's a black box. But for my piece of mind, I'd like to make keys like {session-key}/{location-key}/{file-key} rather than the current {session-key}/{file-key}. And I will, very shortly. Be warned if you try the software that the next version will lose all your existing backups for that reason.
- Backup is useless without restore, and I don't have a utility to do that yet.
- It doesn't back up the database, and must.
More to come.
I'm really good at tangents. Not the line-intersects-circle type (particularly), but the here-I-am-working-on-my-favorite-software-project-when-I-start-a-new-one-instead type. For quite a while now, I've been unhappy with the data backup solution I've been using on my servers. Years, really. And recently I added a Windows server, which just made it worse. Windows has a way of doing that. In any case, the problem really isn't one of platform, or software usability or performance. It's about horizon and rotation. The problem is philosophy. Once upon a time
In the olden days, when men were real men and used tape cartridges for backup (some with as much as 4GB of storage), we would do "tape rotation." The mainframe folks, being the lazy sots there always were, had a robot do it for them, but that's another tangent. The simplest rotation scheme looks like this: Simple, sure, but since you only have the one tape, if it goes south, so does your data. And once it fills up, you have an even bigger problem: erase and start over, or get another tape. That's a biggie. So then we did "round-robin" wherein we used a different tape for each day of the week. Wow. Progress. Same problem, only it took longer to realize. The amount of time you can keep a copy of data on a backup media is called the "Backup Horizon" and is perhaps the most important and yet least designed-for attribute of backup solutions through the ages. Attempts were made, to be sure, but they had to deal with the fact that tapes cost money, often a lot of money. Somebody at some point came up with the "father-son" rotation strategy, wherein you used four tapes, one for each of Monday - Thursday (the world was simpler when nobody worked on weekends) and four more tapes, one for each Friday in the month. Each Friday, you'd do a Full backup (all data gets copied to tape), and then Monday thru Thursday you'd do either an Incremental (all files changed or added since the last day's backup go to tape) or a Differential (all files changed or added since the last Full backup go to tape) on each weekday. This extended the apparent (not real, more below) backup horizon to one month (give or take). Adding a grandfather level to this (twelve "monthly" tapes) gives an apparent backup horizon of one year. Enter reality
But the truth is, it doesn't really work that way, for many important reasons. The first is that data changes more rapidly than the rotation can effectively deal with. If an important file is created on Monday, it gets put on the Monday tape. But then if it's deleted on Tuesday, it's basically forgotten. Let's say you need to recover it from tape on Wednesday: no worries. But try that again the following Tuesday. No joy. Tape's been overwritten. Ok, so maybe you stack your diffs (more than one backup session per tape). When that runs out is a function not of how long you want to retain your data, but how full the tapes get. If you put a no-overwrite policy in place, you end up with a mess of tapes with strange names like 'Monday #4c" which in no way relates to anything meaningful. Many backup plans are not so much data retention plans as disaster recovery plans. If something goes horribly wrong, you get back to almost where you were, but not quite. But hey, that's what disaster means, yeah? But a bigger issue is that if any one tape goes south, that data is just plain gone. You may or may not have redundant copies somewhere. Good luck finding and verifying them, in any case. And since the more you use a backup tape, the closer you get to tape failure, the system is inadvertently designed to lose exactly the kind of data you will most likely want to recover - the highly volatile, transient stuff. The document you accidentally cut from but forgot to paste back into and didn't notice since you were moving a paragraph from page 10 to page 225. As disk space began to seriously eclipse tape capacity, and as data became more decentralized (no longer on the corporate servers), these backup solutions started having trouble keeping up. And fell into disuse. Yeah, sure, all the big shops have backup server farms now, but it's not as easy for the smaller shops. So we tend to use the "a copy is a backup" philosophy. And it ain't. Return of the King (or at least he's started the journey)
The only backup solution I've ever seen that DID get it right was called "Network Archivist" from a now-defunct company called Palindrome. Their offices were only a few miles from my current house, and just down the block from my former data center. Not that that has any relevance, since at the time I was using NA, I lived something like 40 miles away and didn't even have a data center. So because I need a better backup solution for my own servers and those of some clients, and all the offerings on the market now are of the just-copy-it-somewhere-and-call-it-a-backup variety...
|