I released version 0.4 of NArchivist, my cloud-based backup/archiving solution, about a week ago. I should have made a blog entry about it then, but I've been tied up with other things. Plus, I wasn't entirely sure I wasn't going to have to make another quick-fix release. This is largely because I gutted the encryption sub-system to satisfy two requirements. First, it needed to support streaming of very large files to and from the data stores, and second, it had to work on my older CentOS5 servers. Version 0.3 was built with a tool called m2secret, which I used to encrypt the files I was backing up. Ok, no worries, except I found out the hard way that m2secret needed Python 2.5 and, alas, my CentOS5 servers only had Python 2.4. Upgrading wasn't really a good option. Plus, m2secret operates by reading the entire file into memory, then encrypting it, then sending it to the target. Fine, for small(-ish) files, but for larger files this could be a real problem. I knew this going in, but wanted a quick solution that I could fix later. Bumping into the Python 2.4/2.5 problem made it later. So I fixed it. This has pushed back my Windows support a bit, and my creation of a web-based interface for the system. But that'll be along fairly soon. No choice, really; I need the thing myself.
I just put up a site at espositoholdings.com. I've had the domain for a while now. I got it on a whim, because I thought it would be kinda funny, the first step in building a massive empire of web properties. One of my cousins does graphic design as a hobby (he's quite good, really, when he can put the time in), so he whipped up a logo for thing, also because he thought it was kinda funny. It's this: So I had this domain and this logo just sitting there. I had this idea that each of the little squares should hold a logo for one of the "web properties" and it would animate in some way when you moused over it. A couple of months ago, I can across a neat little effect on Chris Coyier's CSS Tricks site where moving your mouse over a "black" image "reveals" the portion the mouse is over. Which reminded me of my earlier notion, but it still took me until today to jump in and actually try to get it done. I started off with the techniques in Chris' article, but it wasn't quite what I wanted. Let's examine what he did, and how I differed, shall we? The basics are a set of regions, defined by CSS to have a particular size and position over the image, with jQuery used to catch the mouse hover event over each region and then turn on and off the correct "revealed" image. Chris' method was to create five different images, one for each "state." There was the regular "nothing highlighted" image, and since he had four "hot spots," there were four more images, each with three parts black and one revealed. Since each element in his image was basically the same width, and all lined up horizontally, the CSS placement was really pretty simple. My Way or the Highway
I didn't really want to create four different versions of my logo (I have only three "hot-spots" rather than the four Chris has), partly out of laziness, partly because I wanted to be able to swap those images out whenever I felt like it. So I set about to define the area above each square in the logo which would be my hotspot. This was pretty simple to calculate, and a bit of trial-and-error got it looking OK. Then I created the small (54x54) logos which would appear magically inside their respective squares. These are PNGs with transparent backgrounds, so they look natural when revealed. The HTML markup is a dead ripoff of Chris's, as is most of the CSS and jQuery code. But I made a couple of changes. First, Chris used absolute positioning for his elements, but since they stack next to each other, that wasn't such a big deal, and he could give them each the same basic styling. Where he had just one selector for those elements: .home-roll-box { position: absolute; z-index: 1000; display: block; height: 334px; top: 0; width: 25%; } I made a generic one, and then picked out each area using the clever Attribute Selector technique (boy, that Chris Coyier is a handy guy!), thus: .rollover { display: block; height: 54px; position: absolute; width: 54px; z-index: 1000; } .rollover[id="tcms"] { top: 18px; margin-left: 72px; } .rollover[id="narchivist"] { top: 74px; margin-left: 18px; } .rollover[id="openingup"] { top: 128px; margin-left: 72px; } That got my rollover areas positioned, but since I have three images to fade in and out in three different locations, and they are all smaller than the main logo image, I couldn't just superimpose them on the main logo. So a bit more JS does the job: $(".rollover").each(function (i) { // move the logo into the corresponding rollover area var logo = "#" + $(this).attr("id") + "_logo"; $(logo).css ("top", $(this).css("top")); $(logo).css ("margin-left", $(this).css("margin-left")); });This walks through each "rollover" area, grabs it's top-position and left margin, and then applies them to the corresponding image. I like the results, but of course that's always subjective. I'm open to other opinions.
Well, after a bunch of non-NArchivist work, I finally got back to my current pet project (though I'm really itching to get back to my REAL pet project, the TurtolCMS), and now NArchivist, my backup software, will not only back files up, but it can actually RESTORE them as well. One would hope that would be a standard feature. I've just "released" version 0.3 on SourceForge. New features include: - A basic desktop client from which you can select files to restore. Of course, any file you would overwrite via a restore is first backed up (if necessary).
- RackSpace CloudFiles support, so now you can store your backups to two truly distinct clouds, thus making your data a bit more secure.
- After every backup session it backs up its own database. You might think this would only be needed after a backup session in which files were uploaded, but since we also record "last-seen" data to catch file deletions, the database goes up every time. Versioned, of course.
Next, I'll have to make the website a bit more complete. Which is to say complete at all. After that, I'll try to get it running on Window, and flesh out the web-based interface I started.
I've just put up the first "version" of my new backup software, NArchivist, on SourceForge. It's not available as a regular download just yet, as there's too much to do to make it ready for a casual install (even tho "casual" by SourceForge standards usually means pretty technical). So you can get it from the source repository. See the project summary page for the gritty-kitty details, or you can start with my lame-o-rama project home page if you're in for a larf. Stuff It Does- It takes files from one or more "locations" on your filesystem(s), and copies them and their metadata (stuff like ownership, permissions, etc) to one or more "targets" in the cloud. For now, I only support Amazon S3, so if you want more than one target, you need to either create more than one s3 bucket or more than one AWS/s3 account. Or you can add support for other storage targets. Go ahead, it'll be fun.
- When a file changes, it backs it up again, and continue to until it reaches a configurable minimum number of copies. Basic versioning.
- File names are obscured. Actually, they're mapped via database to other, cryptographically-generated name. Session names are not obscured, however.
- It keeps a database of each of these operations, so we know where all our stuff is.
- There is a basic, text-mode installation/configuration program. This doesn't install the software, per-se. That'll be next. This just allows you to bootstrap the thing, create the database, and manage the target, location and backup parameters. Very rudimentary.
Stuff It Doesn't Do But WillIn no particular order: - Sessions ("runs" of the backup routine) are named and kept separate on the targets, but the files from all locations are intermingled within the session. I'd like to change this. Ok, truth be told, since I'm using cloud storage, I really don't know how things are "mingled" at all. It's a black box. But for my piece of mind, I'd like to make keys like {session-key}/{location-key}/{file-key} rather than the current {session-key}/{file-key}. And I will, very shortly. Be warned if you try the software that the next version will lose all your existing backups for that reason.
- Backup is useless without restore, and I don't have a utility to do that yet.
- It doesn't back up the database, and must.
More to come.
I'm really good at tangents. Not the line-intersects-circle type (particularly), but the here-I-am-working-on-my-favorite-software-project-when-I-start-a-new-one-instead type. For quite a while now, I've been unhappy with the data backup solution I've been using on my servers. Years, really. And recently I added a Windows server, which just made it worse. Windows has a way of doing that. In any case, the problem really isn't one of platform, or software usability or performance. It's about horizon and rotation. The problem is philosophy. Once upon a time
In the olden days, when men were real men and used tape cartridges for backup (some with as much as 4GB of storage), we would do "tape rotation." The mainframe folks, being the lazy sots there always were, had a robot do it for them, but that's another tangent. The simplest rotation scheme looks like this: Simple, sure, but since you only have the one tape, if it goes south, so does your data. And once it fills up, you have an even bigger problem: erase and start over, or get another tape. That's a biggie. So then we did "round-robin" wherein we used a different tape for each day of the week. Wow. Progress. Same problem, only it took longer to realize. The amount of time you can keep a copy of data on a backup media is called the "Backup Horizon" and is perhaps the most important and yet least designed-for attribute of backup solutions through the ages. Attempts were made, to be sure, but they had to deal with the fact that tapes cost money, often a lot of money. Somebody at some point came up with the "father-son" rotation strategy, wherein you used four tapes, one for each of Monday - Thursday (the world was simpler when nobody worked on weekends) and four more tapes, one for each Friday in the month. Each Friday, you'd do a Full backup (all data gets copied to tape), and then Monday thru Thursday you'd do either an Incremental (all files changed or added since the last day's backup go to tape) or a Differential (all files changed or added since the last Full backup go to tape) on each weekday. This extended the apparent (not real, more below) backup horizon to one month (give or take). Adding a grandfather level to this (twelve "monthly" tapes) gives an apparent backup horizon of one year. Enter reality
But the truth is, it doesn't really work that way, for many important reasons. The first is that data changes more rapidly than the rotation can effectively deal with. If an important file is created on Monday, it gets put on the Monday tape. But then if it's deleted on Tuesday, it's basically forgotten. Let's say you need to recover it from tape on Wednesday: no worries. But try that again the following Tuesday. No joy. Tape's been overwritten. Ok, so maybe you stack your diffs (more than one backup session per tape). When that runs out is a function not of how long you want to retain your data, but how full the tapes get. If you put a no-overwrite policy in place, you end up with a mess of tapes with strange names like 'Monday #4c" which in no way relates to anything meaningful. Many backup plans are not so much data retention plans as disaster recovery plans. If something goes horribly wrong, you get back to almost where you were, but not quite. But hey, that's what disaster means, yeah? But a bigger issue is that if any one tape goes south, that data is just plain gone. You may or may not have redundant copies somewhere. Good luck finding and verifying them, in any case. And since the more you use a backup tape, the closer you get to tape failure, the system is inadvertently designed to lose exactly the kind of data you will most likely want to recover - the highly volatile, transient stuff. The document you accidentally cut from but forgot to paste back into and didn't notice since you were moving a paragraph from page 10 to page 225. As disk space began to seriously eclipse tape capacity, and as data became more decentralized (no longer on the corporate servers), these backup solutions started having trouble keeping up. And fell into disuse. Yeah, sure, all the big shops have backup server farms now, but it's not as easy for the smaller shops. So we tend to use the "a copy is a backup" philosophy. And it ain't. Return of the King (or at least he's started the journey)
The only backup solution I've ever seen that DID get it right was called "Network Archivist" from a now-defunct company called Palindrome. Their offices were only a few miles from my current house, and just down the block from my former data center. Not that that has any relevance, since at the time I was using NA, I lived something like 40 miles away and didn't even have a data center. So because I need a better backup solution for my own servers and those of some clients, and all the offerings on the market now are of the just-copy-it-somewhere-and-call-it-a-backup variety...
|