• 0 Posts
  • 48 Comments
Joined 1 year ago
cake
Cake day: July 30th, 2023

help-circle


    • I never said anything about EFI not supporting multi boot. I said that the had to be kept in lockstep during updates. I recognize the term “manual” might have been a bit of a misnomer there, since I included systems where the admin has to take action to enable replication. ESXi (my main hardware OS for now) doesn’t even have software RAID for single-server datastores (only vSAN). Windows and Linux both can do it, but its a non-default manual process of splicing the drives together with no apparent automatic replacement mechanism - full manual admin intervention. With a hardware RAID, you just have to plop the new disk in and it splices the drive back into the array automatically (if the drive matches)
    • Dell and HPe both have had RAM caching for reads and writes since at least 2011. That’s why the controllers have batteries :)
      • also, I said it only had to handle the boot disk. Plus you’re ignoring the fact that all modern filesystems will do page caching in the background regardless of the presence of hardware cache. That’s not unique to ZFS, Windows and Linux both do it.
    • mdadm and hardware RAID offer the same level of block consistency validation to my current understanding- you’d need filesystem-level checksumming no matter what, and as both mdadm and hardware RAID are both filesystem agnostic, they will almost equally support the same filesystem-level features (Synology implements BTRFS on top of mdadm - I saw a small note somewhere that they had their implementation request block rebuild from mdadm if btrfs detected issues, but I have been unable to verify this claim so I do not consider it (yet) as part of my hardware vs md comparison)

    Hardware RAID just works, and for many, that’s good enough. In more advanced systems, all its got to handle is a boot partition, and if you’re doing your job as a sysadmin there’s zero important data in there that can’t be easily rebuilt or restored.


  • I never said I didn’t use software RAID, I just wanted to add information about hardware RAID controllers. Maybe I’m blind, but I’ve never seen a good implementation of software RAID for the EFI partition or boot sector. During boot, most systems I’ve seen will try to always access one partition directly and a second in order, which is bypassing the concept of a RAID, so the two would need to be kept manually in sync during updates.

    Because of that, there’s one notable place where I won’t - I always use hardware RAID for at minimum the boot disk because Dell firmware natively understands everything about it from a detect/boot/replace perspective. Or doesn’t see anything at all in a good way. All four of my primary servers have a boot disk on either a Startech RAID card similar to a Dell BOSS or have an array to boot off of directly on the PERC. It’s only enough space to store the core OS.

    Other than that, at home all my other physical devices are hypervisors (VMware ESXi for now until I can plot a migration), dedicated appliance devices (Synology DSM uses mdadm), or don’t have a redundant disks (my firewall - backed up to git, and my NUC Proxmox box, both firewalls and the PVE are all running ZFS for features).

    Three of my four ESXi servers run vSAN, which is like Ceph and replaces RAID. Like Ceph and ZFS, it requires using an HBA or passthrough disks for full performance. The last one is my standalone server. Notably, ESXi does not support any software RAID natively that isn’t vSAN, so both of the standalone server’s arrays are hardware RAID.

    When it comes time to replace that Synology it’s going to be on TrueNAS


  • For recovering hardware RAID: most guaranteed success is going to be a compatible controller with a similar enough firmware version. You might be able to find software that can stitch images back together, but that’s a long shot and requires a ton of disk space (which you might not have if it’s your biggest server)

    I’ve used dozens of LSI-based RAID controllers in Dell servers (of both PERC and LSI name brand) for both work and homelab, and they usually recover the old array to the new controller pretty well, and also generally have a much lower failure rate than the drives themselves (I find myself replacing the cache battery more often than the controller itself)

    Only twice out of the handful of times I went to a RAID controller from a different generation

    • first time from a mobi failed R815 (PERC H700) physically moving the disks to an R820 (PERC H710, might’ve been an H710P) and they were able to foreign import easily
    • Second time on homelab I went from an H710 mini mono to an H730P full size in the same chassis (don’t do that, it was a bad idea), but aside from iDRAC being very pissed off, the card ran for years with the same RAID-1 array imported.

    As others have pointed out, this is where backups come into play. If you have to replace the server with one from a different generation, you run the risk that the drives won’t import. At that point, you’d have to sanitize the super block of the array and re-initialize it as a new array, then restore from backup. Now, the array might be just fine and you never notice a difference (like my users that had to replace a failed R815 with an 820), but the result pattern is really to the extremes of work or fault with no in between.

    Standalone RAID controllers are usually pretty resilient and fail less often than disks, but they are very much NOT infallible as you are correct to assess. The advantage to software systems like mdadm, ZFS, and Ceph is that it removed the precise hardware compatibility requirements, but by no means does it remove the software compatible requirements - you’ll still have to do your research and make sure the new version is compatible with the old format, or make sure it’s the same version.

    All that’s said, I don’t trust embedded motherboard RAIDs to the same degree that I trust standalone controllers. A friend of mine about 8-10 years ago ran a RAID-0 on a laptop that got it’s super block borked when we tried to firmware update the SSDs - stopped detecting the array at all. We did manage to recover data, but it needed multiple times the raw amount of storage to do so.

    • we made byte images of both disks in ddrescue to a server that had enough spare disk space
    • found a software package that could stitch together images with broken super blocks if we knew the order the disks were in (we did), which wrote a new byte images back to the server
    • copied the result again and turned it into a KVM VM to network attach and copy the data off (we could have loop mounted the disk to an SMB share and been done, but it was more fun and rewarding to boot the recovered OS afterwards as kind of a TAKE THAT LENOVO…we were younger)
    • took in total a bit over 3TB to recover the 2x500GB disks to a usable state - and took about a week of combined machine and human time to engineer and cook, during which my friend opted to rebuild his laptop clean after we had images captured - to one disk windows, one disk Linux, not RAID-0 this time :P




  • I mean… DX 9, 10, and 11 were all released prior to Nadella being CEO/chairman.

    But in software, it’s very commonplace for library versions not to be backwards compatible without recompiling the software. This isn’t the same thing as being able to open a word doc last saved on a floppy disk in 1997 on Word 365 2024 version, this is about loading executable code. Even core libraries in Linux (like OpenSSL and ncurses) respect this same schema, and more strongly than MS.

    Using OpenSSL as an example, RHEL 7 provides an interface to OpenSSL 1.0. But 1.1 is not available in the core OS, you’d have to install it separately. 1.1 was introduced to the core in RHEL 8, with a compatibility library on a separate package to support 1.0 packages that hadn’t been recompiled against 1.1 yet. In RHEL 9, the same was true of OpenSSL 3 - a compatibility library for 1.1, and 1.0 support fully dropped from core. So no matter which version you use, you still have to install the right library package. That library package will then also have to work on your version of libc - which is often reasonably wide, but it has it limits just the same.

    Edit because I forgot a sentence in the last paragraph - like DirectX, VC++, and OpenGL, you have to match the version of ncurses, OpenSSL, etc exactly to the major (and often the minor) version or else the executable won’t load up and will generate a linking error. Even if you did mangle the binary code to link it, you’d still end up with data corruption or crashes because the library versions are too different to operate.



  • DirectX, OpenGL, Visual C++ Redist and many other support libraries in software programs typically require the same major version of the support libraries that they were shipped with.

    For DirectX, that major version is 9, 10, 11, 12. Any major library change has to be recompiled into the game by the original developer. (Or a very VERY dedicated modder with solid low level knowledge)

    Same goes for OpenGL, except I think they draw the line at the second number as well - 2.0, 3.0, 4.0, 4.1, 4.2, 4.3, 4.4.

    For VC++, these versions come in years - typically you’ll see 2008, 2010, 2013, and the last version 2015-2022 is special. Programs written in the 2013 version or lower only require the latest version of that year to run. For the 2015-2022 library, they didn’t change the major version spec so any program requiring 2015+ can (usually) just use the latest version installed.

    The one library that does weird things to this rule is DXVK and Intel’s older DX9-on-12. These are translation shim libraries that allow the application to speak DX9 etc and translate it on the fly to the commands of a much more modern library - Vulkan in the case of DXVK or DX12 in Intel’s case.

    Edited to remove a reference to 9-on-12 that I think I had backwards.









  • Others have some good information here - all I’d like to add to the root is that Windows and Mac have a built-in DNS cache and it’s pretty straightforward to add a DNS cache to systemd distros (if it’s not already installed or in use) using systemd-resolved or dnsmasq if you really dislike systemd. Some distros enable this from install time.

    Systems that utilize a DNS cache will keep copies of DNS query results for a period of time, making the application-level name lookup speed essentially 0ms for a cached result. Cold results obviously incur the latency of the DNS server itself.