A university near me must be going through a hardware refresh, because they’ve recently been auctioning off a bunch of ~5 year old desktops at extremely low prices. The only problem is that you can’t buy just one or two. All the auction lots are batches of 10-30 units.

It got me wondering if I could buy a bunch of machines and set them up as a distributed computing cluster, sort of a poor man’s version of the way modern supercomputers are built. A little research revealed that this is far from a new idea. The first ever really successful distributed computing cluster (called Beowulf) was built by a team at NASA in 1994 using off the shelf PCs instead of the expensive custom hardware being used by other super computing projects at the time. It was also a watershed moment for Linux, then only a few yeas old, which was used to run Beowulf.

Unfortunately, a cluster like this seems less practical for a homelab than I had hoped. I initially imagined that there would be some kind of abstraction layer allowing any application to run across all computers on the cluster in the same way that it might scale to consume as many threads and cores as are available on a CPU. After some more research I’ve concluded that this is not the case. The only programs that can really take advantage of distributed computing seem to be ones specifically designed for it. Most of these fall broadly into two categories: expensive enterprise software licensed to large companies, and bespoke programs written by academics for their own research.

So I’m curious what everyone else thinks about this. Have any of you built or admind a Beowulf cluster? Are there any useful applications that would make it worth building for the average user?

  • Kangie@lemmy.srcfiles.zip
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    Yes. I’m actually doing so right now at work, and run multiple Beowulf clusters for a research institution. You don’t need or want this.

    In a real cluster you would use software like Slurm or PBS to submit jobs to the cluster and have them execute on your compute nodes as resources are available to keep utilisation high.

    It makes no sense for the home environment unless you’re trying to run some serious computations and if you have a need to do that for work or study then you probably have access to a real HPC.

    It might be interesting and fun, but not particularly useful. Maybe a fun HCI setup would be more appropriate to enable you to scale VMS across hosts and get some redundancy.

    • plenipotentprotogod@lemmy.worldOP
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      Out of curiosity, what software is normally being run on your clusters? Based on my reading, it seems like some companies run clusters for business purposes. E.g. an engineering company might use it for structural analysis of their designs, or a pharmaceutical company might simulate the interactions of new drugs. I assume in those cases they’ve bought a license for some kind of high-end software that’s been specifically written to run in a distributed environment. I also found references to some software libraries that are meant to support writing programs in this environment. I assume those are used more by academics who have a very specific question they want to answer (and may not have funding for commercial software) so they write their own code that’s hyper focused on their area of study.

      Is that basically how it works, or have I misunderstood?

      • Kangie@lemmy.srcfiles.zip
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 year ago

        Overall you’re not too far off, but what you’ll tend to find is that it’s a lot of doing similar calculations over and over.

        For example, climate scientists may, for certain experiments, read a ton of data from storage for say different locations and date/times across a bunch of jobs, but each job is doing basically the same thing - you might submit 100000 permutations, or have an updated model that you want to crunch the existing dataset out with.

        The data from each job is then output, and analysed (often with followup batch jobs).

        Edit: here’s an example of a model that I have some real-world experience building to run on one of my clusters: https://www.nrel.colostate.edu/projects/century/

        Swin have some decent, public docs. I think mine are pretty good, but they’re not public so…

        https://supercomputing.swin.edu.au/docs/2-ozstar/oz-partition.html

        There will typically be some interactive nodes in a cluster as well that enable users to log in and perform interactive tasks, like validating that the software will run or, more commonly, to submit jobs to the queue manager.

  • rufus@discuss.tchncs.de
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Kubernetes / K8s / K3s.

    If your trying to do compute / simulations on it, it depends on your workload… OpenMPI… ClusterKnoppix / LinuxPMI …

  • lemmyingly@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    A friend and I created one years ago when we were at university made with 6 machines. We were running MATLAB simulations that would take over a day to complete on i3/i5 CPUs. Fortunately MATLAB and the simulation add-on package had been programmed to parallelize jobs, which reduced the simulation time down to just a few hours. This was done in a Windows environment with dual core HP machines with every RAM slot filled.

    I can’t imagine homelab workloads benefitting from such a set up unless something like video/3D rendering can utilise it.

  • There are aseveral options, although some may be defunct.

    Last time I looked into this, openMosix was the most interesting, affordable, general-purpose option. It turned several computers into one big virtual computer. I ran a very small, 3-node cluster for a time. The upside was that you could run almost anything on it - unlike most HPC solutions, it didn’t require bespoke languages, libraries, or targetted solutions. The downside was performance; it turns out that to really take adventage of HPC, you really need to program for it. OpenMosix looks defunct now.

    OpenPMIx looks to have taken up the torch from OpenMosix. It looks active; I have no specific knowledge about it.

    tldp.org has some good required reading before you invest in this, in particular discussing the elephant in the room, networking latency. The short version is that, no matter how slow your computers, the bottleneck will still be the network. Unless you’re willing to invest a lot into fiber and expensive, fast switches, it’s probably not worth it.

    slurm crosses the line into modern cluster job management, like you might find in a cloud provider like AWS, which is tye direction the non-supercomputer industry took when commodity MPI turned out to be not feasible. Warewolf is another version, sort of one foot in distributed container management and lightweight MPI. Both are pretty involved, more Beowulf than OpenMosix.

    tldr, it’s probably not worth it if you’re looking for a cheap Beowulf cluster, because such a thing doesn’t exist in any practical sense. Cost, and physics, get in the way. If you want to set up a data center, or some job farm like AWS or GCS, that’s another matter. But it’s a far cry from MPI.

  • Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    The main issue I would see impeding this is power draw and heat. Unless you are rigged up to run this many machines (including appropriate UPSes) you may run into blowing fuses and needing almost as much power for air conditioning as you need to run the cluster.

    If you had like 50 SoCs maybe, because the power draw and heat might be manageable. Something like a RaspberryPi or OrangePi.

    • plenipotentprotogod@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      I was looking at HP mini PCs. The ones that were for sale used 7th gen i5s with a 35W TDP. They’re sold with a 65W power brick so presumably the whole system would never draw more than that. I could run a 16 node cluster flat out on a little over a kW, which is within the rating of a single residential circuit breaker. I certainly wouldn’t want to keep it running all the time, but it’s not like I’d have to get my electric system upgraded if I wanted to set one up and run it for a couple of hours as an experiment.