I've just put up the first "version" of my new backup software, NArchivist, on SourceForge. It's not available as a regular download just yet, as there's too much to do to make it ready for a casual install (even tho "casual" by SourceForge standards usually means pretty technical). So you can get it from the source repository. See the project summary page for the gritty-kitty details, or you can start with my lame-o-rama project home page if you're in for a larf. Stuff It Does- It takes files from one or more "locations" on your filesystem(s), and copies them and their metadata (stuff like ownership, permissions, etc) to one or more "targets" in the cloud. For now, I only support Amazon S3, so if you want more than one target, you need to either create more than one s3 bucket or more than one AWS/s3 account. Or you can add support for other storage targets. Go ahead, it'll be fun.
- When a file changes, it backs it up again, and continue to until it reaches a configurable minimum number of copies. Basic versioning.
- File names are obscured. Actually, they're mapped via database to other, cryptographically-generated name. Session names are not obscured, however.
- It keeps a database of each of these operations, so we know where all our stuff is.
- There is a basic, text-mode installation/configuration program. This doesn't install the software, per-se. That'll be next. This just allows you to bootstrap the thing, create the database, and manage the target, location and backup parameters. Very rudimentary.
Stuff It Doesn't Do But WillIn no particular order: - Sessions ("runs" of the backup routine) are named and kept separate on the targets, but the files from all locations are intermingled within the session. I'd like to change this. Ok, truth be told, since I'm using cloud storage, I really don't know how things are "mingled" at all. It's a black box. But for my piece of mind, I'd like to make keys like {session-key}/{location-key}/{file-key} rather than the current {session-key}/{file-key}. And I will, very shortly. Be warned if you try the software that the next version will lose all your existing backups for that reason.
- Backup is useless without restore, and I don't have a utility to do that yet.
- It doesn't back up the database, and must.
More to come.
I'm really good at tangents. Not the line-intersects-circle type (particularly), but the here-I-am-working-on-my-favorite-software-project-when-I-start-a-new-one-instead type. For quite a while now, I've been unhappy with the data backup solution I've been using on my servers. Years, really. And recently I added a Windows server, which just made it worse. Windows has a way of doing that. In any case, the problem really isn't one of platform, or software usability or performance. It's about horizon and rotation. The problem is philosophy. Once upon a time
In the olden days, when men were real men and used tape cartridges for backup (some with as much as 4GB of storage), we would do "tape rotation." The mainframe folks, being the lazy sots there always were, had a robot do it for them, but that's another tangent. The simplest rotation scheme looks like this: Simple, sure, but since you only have the one tape, if it goes south, so does your data. And once it fills up, you have an even bigger problem: erase and start over, or get another tape. That's a biggie. So then we did "round-robin" wherein we used a different tape for each day of the week. Wow. Progress. Same problem, only it took longer to realize. The amount of time you can keep a copy of data on a backup media is called the "Backup Horizon" and is perhaps the most important and yet least designed-for attribute of backup solutions through the ages. Attempts were made, to be sure, but they had to deal with the fact that tapes cost money, often a lot of money. Somebody at some point came up with the "father-son" rotation strategy, wherein you used four tapes, one for each of Monday - Thursday (the world was simpler when nobody worked on weekends) and four more tapes, one for each Friday in the month. Each Friday, you'd do a Full backup (all data gets copied to tape), and then Monday thru Thursday you'd do either an Incremental (all files changed or added since the last day's backup go to tape) or a Differential (all files changed or added since the last Full backup go to tape) on each weekday. This extended the apparent (not real, more below) backup horizon to one month (give or take). Adding a grandfather level to this (twelve "monthly" tapes) gives an apparent backup horizon of one year. Enter reality
But the truth is, it doesn't really work that way, for many important reasons. The first is that data changes more rapidly than the rotation can effectively deal with. If an important file is created on Monday, it gets put on the Monday tape. But then if it's deleted on Tuesday, it's basically forgotten. Let's say you need to recover it from tape on Wednesday: no worries. But try that again the following Tuesday. No joy. Tape's been overwritten. Ok, so maybe you stack your diffs (more than one backup session per tape). When that runs out is a function not of how long you want to retain your data, but how full the tapes get. If you put a no-overwrite policy in place, you end up with a mess of tapes with strange names like 'Monday #4c" which in no way relates to anything meaningful. Many backup plans are not so much data retention plans as disaster recovery plans. If something goes horribly wrong, you get back to almost where you were, but not quite. But hey, that's what disaster means, yeah? But a bigger issue is that if any one tape goes south, that data is just plain gone. You may or may not have redundant copies somewhere. Good luck finding and verifying them, in any case. And since the more you use a backup tape, the closer you get to tape failure, the system is inadvertently designed to lose exactly the kind of data you will most likely want to recover - the highly volatile, transient stuff. The document you accidentally cut from but forgot to paste back into and didn't notice since you were moving a paragraph from page 10 to page 225. As disk space began to seriously eclipse tape capacity, and as data became more decentralized (no longer on the corporate servers), these backup solutions started having trouble keeping up. And fell into disuse. Yeah, sure, all the big shops have backup server farms now, but it's not as easy for the smaller shops. So we tend to use the "a copy is a backup" philosophy. And it ain't. Return of the King (or at least he's started the journey)
The only backup solution I've ever seen that DID get it right was called "Network Archivist" from a now-defunct company called Palindrome. Their offices were only a few miles from my current house, and just down the block from my former data center. Not that that has any relevance, since at the time I was using NA, I lived something like 40 miles away and didn't even have a data center. So because I need a better backup solution for my own servers and those of some clients, and all the offerings on the market now are of the just-copy-it-somewhere-and-call-it-a-backup variety...
The problems with being, as I so often am, distracted by too many interests and projects, are many. I thought about working another clause into that first sentence so to further demonstrate my fractured lifestyle, but I got thinking about something else. Anyway, this is just a very quick entry to point out that I really should take more time (a) to update this blog and (2) to just do things correctly in the first place so that when I change a password (for example) it doesn't fubar something else (i.e.) this blog which shouldn't have relied on that particular password anyway. Sorry for the downtime. I suck.
Ok, I broke my promise to announce the latest-n-greatest TurtolCMS here. I did do the release (some while ago, now), but didn't make an announcement. Color me lazy, or perhaps just overly busy. And the book review will be forthcoming, some day. Just last night, my cousin Rob asked me something along the lines of "have you ever been thinking about a blog entry for so long that you just couldn't get it done?" He was talking about his own blog, mind, but hit the nail on the head for me. So here's me getting it done, finally. I alluded, in my previous entry, to another blog I would reference for a starting point. That reference is below, but first, a little history. A little historyTurtol, my much lamented previous endeavor, was originally started to be a hosting company, with 37signals-like aspirations to build some cool web-based applications. Prior to Turtol, I had Yet Another Startup, and after I exited that, I took about 18 months off to be a Dad and work on my 50-some-year-old house and whatnot. Some of my previous clients expressed an interest in my continuing to support their web sites (and so forth), so I put together a tiny little hosting operation, just to service them. It gave me something to do to keep my hand in technology and was just self-sustaining; it made me no money, but I didn't care, because it wasn't supposed to. After a while, some friend or other heard about my hosting, asked if he could have an account, and of course I put that together. Then, of course, he asked for some changes, and I did that. Then another guy came along, and pretty soon I needed to hack together some software to manage it all (web-based, of course), and keep me from actually running a hosting company so I could continue to play with my kids and generally be lazy. Then Mike came calling, having heard about what I had put together, and needing a site for his then-current employer. He asked what I'd charge. I didn't know, because I didn't charge, except for those few who had sparked the whole she-bang in the first place. So I put together pricing and Mike's company cut me a check and all was good. But then Mike said something like "wow, that's like the best control panel I've ever used" and went on to tell me how bad most others were and how his pal Libor (who I hadn't, to that point, met) was always hating having to deal with them and how none of his designer friends even COULD use them. I kinda knew that, since I'd looked at using them before I decided to just build my own. Swell, I said. Thanks, I said. And that was that. Until Mike asked "what would it take for me to resell your services?" The idea, originally, was that I'd run the servers (and develop the control panel further) and he'd sell the service. Several scenarios were suggested. Then Mike had an epiphany about the pricing structure and a few lunches later, Turtol was born. The original business plan was we'd spend some time making the control panel totally kick ass and Libor (who had come on board by then) would help us bootstrap by bringing in ten or so of his freelance designer friends who each had at least a couple of current and upcoming sites needing hosting. But what would become the critical piece of the puzzle was largely an accident born of my frustration with some of the work I was doing.
I've been thinking a bit recently about the title and ostensible purpose of this blog. Now, mind you, this blog is largely a cathartic exercise on my part, not by any means an important bit of... well, anything. But still, I like things to have some sort of order, consistency and meaning. Originally, I started this thing talking about the various facets of life in an Open Source company. It was intended to be very much my personal expression of my personal philosophy, centered around how that philosophy related to running my business. With a bit of other hijinx thrown in whenever I got bored. Since then, of course, the business itself has gone away, leaving in its wake the TurtolCMS software project. And this blog (or at least it's tagline) with no clear meaning. I'm no longer running an open source business. Yes, as I've hinted, I intend to, and have several irons in the fire (both iron and fire being Open Source, naturally). But I'm not, at present, actually running an open source company. I'm managing an Open Source project, but that (at present) drives no income, and without income there really can't be said to be any business. So, do I evolve the direction of this thing to follow along with whatever I'm doing right now? Could be boring, but really I guess I'm doing that already, as I've had to post something from time to time, just to brain-dump. Eventually, I suppose, it will evolve back into a blog about my (next) Open Source business. But how long before that happens? Will I still care? Or do I revolutionize it entirely into... what else? I don't think the intarweb needs another site that links to other sites for the sake of it. I'm just not into that. Not enough catharsis. So, coming up: Another book review (actually, two, but combined into one) not at all related to software, Open Source or Business, just because I wanna; and a post commenting on a story run on another (more widely read) blog, but not just so I can link to it; and the announcement of another release of the TurtolCMS, any day now.
|