507 private links
A scan of archives shows that lots of scientific papers aren't backed up.
Back when scientific publications came in paper form, libraries played a key role in ensuring that knowledge didn't disappear. Copies went out to so many libraries that any failure—a publisher going bankrupt, a library getting closed—wouldn't put us at risk of losing information. But, as with anything else, scientific content has gone digital, which has changed what's involved with preservation.
Organizations have devised systems that should provide options for preserving digital material. But, according to a recently published survey, lots of digital documents aren't consistently showing up in the archives that are meant to preserve it. And that puts us at risk of losing academic research—including science paid for with taxpayer money. //
The risk here is that, ultimately, we may lose access to some academic research. As Eve phrases it, knowledge gets expanded because we're able to build upon a foundation of facts that we can trace back through a chain of references. If we start losing those links, then the foundation gets shakier. Archiving comes with its own set of challenges: It costs money, it has to be organized, consistent means of accessing the archived material need to be established, and so on.
But, to an extent, we're failing at the first step. "An important point to make," Eve writes, "is that there is no consensus over who should be responsible for archiving scholarship in the digital age."
A somewhat related issue is ensuring that people can find the archived material—the issue that DOIs were designed to solve.
Conclusion
There is a certain disparity between problems and features here: I personally can do without most of the features but do not like to live with the problems. Additionally, backup is a must have but also not something one gets in touch with often as the processes themselves are automated at least to the point that I as a user only call a script (e.g. connect USB drive, call script, disconnect). From that point of view, most of the tools’ advantages are largely uninteresting such as long as there are no problems!
This is an unfortunate situation with backup tools in general which may be one of the reasons why there are so few good tools to chose from :)
Without further delay, the following table summarizes the findings by recalling the greatest issues observed for the respective tools:
Tool Problems
Borg
– very slow especially for initial backups
JMBB
– very slow restore
– no deduplication
– no files above 8 GiB
Kopia
– no Unix pipes/special files support
– large caches in Data-Test
– rather large backup sizes
Bupstash
– large file numbers in single directory
My conclusion from this is that Bupstash is a most viable candidate. There are still some rough edges but given that it is the newest among the tools checked that can be expected.
Traditional backup tools can mostly be subdivided by the following characteristics:
-
file-based vs. image-based
Image-based solutions make sure everything is backed up, but are potentially difficult to restore on other (less powerful) hardware. Additionally, creating images by using traditional tools like dd requires the disk that is being backed up to be unmounted (to avoid consistency issues). This makes image-based backups better suited for filesystems that allow doing advanced operations like snapshots or zfs send-style images that contain a consistent snapshot of the data of interest. For file-based tools there is also a distinction between tools that exactly replicate the source file structure in the backup target (e.g. rsync or rdiff-backup) and tools that use an archive format to store backup contents (tar). -
networked vs. single-host
Networked solutions allow backing up multiple hosts and to some extent allow for centralized administration. Traditionally, a dedicated client is required to be installed on all machines to be backed up. Networked solutions can act pull-based (server gets backups from the clients) or push-based (client sends backup to server). Single-Host solutions consist of a single tool that is being invoked to backup data from the current host to a target storage. As this target storage can be a network target, the distinction between networked and single-host solutions is not exactly clear. -
incremental vs. full
Traditionally, tools either do an actual 1:1 copy (full backup) or copy “just the differences“ which can mean anything from “copy all changed files” to “copy changes from within files”. Incremental schemes allow multiple backup states to be kept without needing much disk space. However, traditional tools require that another full backup be made in order to free space used by previous changes.
Modern tools mostly advance things on the incremental vs. full front by acting incremental forever without the negative impacts that such a scheme has when realized with traditional tools. Additionally, modern tools mostly rely on their own/custom archival format. While this may seem like a step back from tools that replicate the file structure, there are numerous potential advantages to be taken from this:
-
Enclosing files in archives allows them and their metadata to be encrypted and portable across file systems.
-
Given that many backups will eventually be stored to online storages like Dropbox, Mega, Microsoft One Drive or Google Drive, the portability across file systems is especially useful. Even when not storing backups online, portability ensures that backup data can be copied by easy operations like cp without damaging the contained metadata. Given that online stores are often not exactly trustworthy, encryption is also required.
Abstract
This article attempts to compare three modern backup tools with respect to their features and performance. The tools of interest are Borg, Bupstash and Kopia.
BorgTUI -- A simple TUI and CLI to automate your Borg backups :^)
Can someone please help decide what is the "best" backup software?
- Restic (https://restic.net/)
- Borg backup (https://www.borgbackup.org/)
- Duplicati (https://www.duplicati.com/)
- Kopia (https://kopia.io/)
- Duplicay (https://duplicacy.com/)
- Duplicity (https://duplicity.us/)
mekster 79 days ago
Do yourself a favor and use zfs as your primary backup, even though it means you'll have to replace your filesystem, it's just that good.
Faster than any other backup software (because it knows what's changed from the last snapshot being the filesystem itself but external backup tools always have to scan the entire directories to know what's changed), battle tested reliability with added benefit like transparent compression.
A bit of explanation on how fast it can be than external tools. (I don't work for the said service in the article or promote it.)
Then you'll realize Borg is the one with least data corruption complaint on the internet which is good as your secondary backup.
Easily checked with, "[app name] data corruption" on Google.
And see who else lists vulnerability and corruption bugs upfront like Borg does and know the developers are forthcoming about these important issues.
https://borgbackup.readthedocs.io/en/stable/changes.html
The term "best" apparently means reliable for backup and also they don't start choking on large data sets taking huge amount of memories and roundtrip times.
They don't work against your favorite S3 compatible targets but there are services that can be targeted for those tools or just roll your own dedicated backup $5 Linux instance to avoid crying in the future.
With those 2, I don't care what other tools exist anymore.
donmcronald 79 days ago
I use ZFS + Sanoid + Syncoid locally and Borg + Borgmatic + BorgBase for offsite.
WhrRTheBaboons 76 days ago
Seconding zfs
Linux-Fan 80 days ago
Bupstash (https://bupstash.io/) beats Borg and Kopia in my tests (see https://masysma.net/37/backup_tests_borg_bupstash_kopia.xhtml). It is a modern take very close to what Borg offers regarding the feature set but has a significantly better performance (in terms of resource use for running tasks, the backups were slightly larger than Borg's in my tests).
dpbriggs 79 days ago
Personally I use borg with BorgTUI (https://github.com/dpbriggs/borgtui) to schedule backups and manage sources/repositories. I'm quite pleased with the simplicity of it compared to some of the other solutions.
Kopia’s development has accelerated in 2020 and is quickly approaching 1.0. While a number of new features have shown up within the tool, this post will concentrate on the performance improvements made over the last few months. To do that, we will compare v0.4.0 (January, 2020), v0.5.2 (March, 2020), and v0.6.0-rc1 (July, 2020). We will additionally also compare it to restic, another popular open-source backup tool. All binaries were downloaded from GitHub. With the exception of the s2-standard compression scheme being enabled with kopia, the default options were used for all tools. //
As can be seen in the above results, kopia’s performance has improved significantly over the last few releases. The time taken to backup 200GiB of data has been reduced from ~840 seconds to ~200! For just a single process, this translates to an effective processing bandwidth of 1 GiB/second and an upload bandwidth utilization of 3.5 Gbps.
For storing rarely used secrets that should not be kept on a networked computer, it is convenient to print them on paper. However, ordinary barcodes can store not much more than 2000 octets of data, and in practice even such small amounts cannot be reliably read by widely used software (e.g. ZXing).
In this note I show a script for splitting small amounts of data across multiple barcodes and generating a printable document. Specifically, this script is limited to less than 7650 alphanumeric characters, such as from the Base-64 alphabet. It can be used for archiving Tarsnap keys, GPG keys, SSH keys, etc.
On Sun, Apr 04, 2021 at 10:37:47AM -0700, jerry wrote:
Ideas? Right now, I'm experimenting with printed barcodes.
You might be interested in:
https://lab.whitequark.org/notes/2016-08-24/archiving-cryptographic-secrets-on-paper/
which was written specifically for tarsnap keys.
Cheers,
- Graham Percival
use Tarsnap for my critical data. Case in point, I use it to backup my Bacula database dump. I use Bacula to backup my hosts. The database in question keeps track of what was backed up, from what host, the file size, checksum, where that backup is now, and many other items. Losing this data is annoying but not a disaster. It can be recreated from the backup volumes, but that is time consuming. As it is, the file is dumped daily, and rsynced to multiple locations.
I also backup that database daily via tarsnap. I’ve been doing this since at least 2015-10-09.
The uncompressed dump of this PostgreSQL database is now about 117G.
I was interested in trying out a service like OneDrive or Dropbox, but one thing always held me back: the idea that at any moment, and for any reason, the company could lock me out of my files.
The problem
No one wants to have their data held hostage by a third-party. How can you get the benefits of using cloud storage while also retaining ownership rights and having a level of assurance that your files will always be accessible?
The solution
Luckily, there’s a simple solution: Perform full backups of your cloud files in an environment that you control.
"Backup your data, you say?! What a novel idea!" /S
The setup
I use rclone
to sync files from my cloud storage accounts to a VM running Alpine Linux. rclone
works with over 40 cloud storage providers, has a very easy-to-use CLI, and works with modern authentication systems.
A cron job runs daily, pulling down any file changes into the backup.
I have the replication job set to exhaustively copy all files in the account to the local machine.
all the tags from https://b.plas.ml
1st-amendment 2nd-amendment 4th-amendment 5th-amendment 9/11 a8 abortion acl adhd afghanistan africa a/i air-conditioning amateur-radio amazon america american android animals anti-americanism antifa anti-semitism antiv antivirus aoip apollo apple appliances archaeology architecture archive art astronomy audio automation avatar aviation backup bash batteries belleville bible biden bill-of-rights biology bookmarks books borg bush business calibre camping capitalism cellphone censorship chemistry children china christianity church cia clinton cloud coldwar communication communist composed computers congress conservatives constitution construction cooking copyleft copyright corruption cosmology counseling creation crime cron crypto culture culture-of-death cummins data database ddt dd-wrt defense democrats depression desantis development diagrams diamonds disinformation diy dns documentation dokuwiki domains dprk drm drm-tpm drugs dvd dysautonomia earth ebay ebola ebook economics education efficiency electricity electronics elements elwa email energy engineering english environment environmentalism epa ethernet ethics europe euthanasia evolution faa facebook family fbi fcc feminism finance firewall flightsim flowers fonts français france fraud freebsd free-speech fun games gardening genealogy generation generators geography geology gifts git global-warming google gop government gpl gps graphics green-energy grounding hdd-test healthcare help history hollywood homeschool hormones hosting houses hp html humor hunting hvac hymns hyper-v imap immigration india infosec infotech insects instruments interesting internet investing ip-addressing iran iraq irs islam israel itec j6 journalism jumpcloud justice kindle kodi language ldap leadership leftist leftists legal lego lgbt liberia liberty linguistics linux literature locks make malaria malware management maps markdown marriage mars math media medical meshcentral metatek metric microbit microsoft mikrotik military minecraft minidisc missions moon morality mothers motorola movies mp3 museum music mythtv names nasa nature navigation navy network news nextcloud ntp nuclear obama ocean omega opensource organizing ortlip osmc oxygen paint palemoon paper parents passwords patents patriotism pdf petroleum pets pews photography photo-mgmt physics piano picasa plesk podcast poetry police politics pollution pornography pots prayer pregnancy presentations press printers privacy programming progressive progressives prolife psychology purchasing python quotes rabbits rabies racism radiation radio railroad reagan recipes recording recycling reference regulations religion renewables republicans resume riots rockets r-pi russia russiagate safety samba satellites sbe science sci-fi scotus secularism security servers shipping ships shooting shortwave signal sjw slavery sleep snakes socialism social-media software solar space spacex spam spf spideroak sports ssh statistics steampowered streaming supplement surveillance sync tarsnap taxes tck tds technology telephones television terrorism tesla theology thorium thumbnail thunderbird time tls tools toyota trains transformers travel trump tsa twitter typography ukraine unions united.nations unix ups usa vaccinations vangelis vehicles veracrypt video virtualbox virus vitamin vivaldi vlc voting vpn w3w war water weather web whatsapp who wifi wikipedia windows wordpress wuflu ww2 xigmanas xkcd youtube zfs