What makes CrowdStrike so ubiquous that their error created such catastrophe?

pastermil@sh.itjust.works · 1 year ago

What makes CrowdStrike so ubiquous that their error created such catastrophe?

fjordbasa@lemmy.world · 1 year ago

It’s not so much that it’s ubiquitous so much as the customers that DID use it were very large and their going down was very noticeable.

Lem Jukes@lemm.ee · 1 year ago

https://youtu.be/4yDm6xNeYas?si=0VzBxIuPEHC4SMaa

This fireship video is a good, short explanation.

CaptainBasculin@lemmy.ml · 1 year ago

Basically, drivers can launch code all the way up to ring 0, the highest level a code can access to. This mean it runs its code with the same priviledges as the kernel itself. The anti-malware solution CrowdStrike makes use of this access to determine what could be going wrong, and deploy solutions accordingly.

If a code running in that level crashes, Windows will rightfully assume there’s something really fucked up is going on, and give out a BSOD.

Blizzard@lemmy.zip · 1 year ago

there’s something really fucked up going on

I would actually prefer this kind of error over the usual and equally uninformative “Oopsie! Something went wrong. We’re sorry :(”

Skull giver@popplesburger.hilciferous.nl · 1 year ago

The QR code Windows displays usually brings up a documentation page containing more information. Windows also displays a stop code next to the QR code (something like BOOT_DEVICE_NOT_FOUND, MEMORY_MANAGEMENT, CRITICAL_ACCESS_DENIED) and the failing driver’s name (if available).

If you want to dig into the details, you can run a program like WhoCrashed, or dig into the crash using windbg to analyse the crashdump file on the hard drive.

I hate the “something went wrong” popups individual applications show (though I admit I’ve written those myself to deal with errors that should never ever happen), but bluescreens are usually quite informative if you read beyond the indicator for regular people.

Windows used to dump memory locations of the failing driver and even opcodes, the same way Linux does, but that scared a lot of people because they had no hope of understanding any of it. With KASLR the memory addresses are useless anyway, and it’s not like modern drivers come with debug symbols to show the crashing method name, so Windows started hiding unnecessary details, which I think is a good thing.

kenkenken@sh.itjust.works · 1 year ago

Probably it runs with privileges of the OS level, what applications should not do. The second problem is monoculture. To run the same software of a single company an all machines is easy, but…

vext01@lemmy.sdf.org · 1 year ago

Is it a kernel module or what? Why did it BSOD the whole system?

lmaydev@lemmy.world · edit-2 1 year ago

It literally has to run at that level to do it’s job.

OfCourseNot@fedia.io · 1 year ago

‘He’s out of line but he’s right’. I mean, is a bit ironic to give this level of permission to a program that is too malware-like to protect yourself from exactly that. We’re talking about hospitals, airports and airlines, government agencies… many critical systems, so much information’s security rely on a (foreign for most of the world) private company.

CaptainBasculin@lemmy.ml · 1 year ago

Companies wouldn’t mind having an OS level code run on their PCs if its meant to help secure their computers. A malware infecting their computers could result in way more damages after all.

kenkenken@sh.itjust.works · 1 year ago

I’m not so sure what is worse. I wish we wouldn’t reimplement statist practices in computers, as it often not goes well in our physical world, and invent more resources into OS/network security, compartmentalization and privilege separation. But yeah, the reality is it’s easier to put a god-like “trusted” agent in a system. Well, the police need have guns, read all private chats, place security cameras with face recognition everywhere… to do their jobs. Otherwise terrorist attacks or whatever could result in way more damages after all. The same story every time.

hikaru755@feddit.de · 1 year ago

Are you seriously equating security software running on business systems with state violence / surveillance on people? Those two things are not even remotely comparable, starting with business systems not being people that have rights

Microw@lemm.ee · 1 year ago

The equation by the user is bs.

But these companies do hold people’s data, and it’s a catch 22 situation: in order to protect that, they rely on an invasive system. Providers like Crowdstrike have high-level access to critical infrastructure and critical information. Is the a good thing? Maybe yes, maybe no.

kenkenken@sh.itjust.works · 1 year ago

BTW, if Windows had been an immutable OS the case would not have been so dire.

Chozo@fedia.io · 1 year ago

If my grandmother had wheels, she would have been a bike.

Nis@feddit.dk · 1 year ago

It’s a different recipe!

Skull giver@popplesburger.hilciferous.nl · 1 year ago

How so? System Restore already automatically reverts to OS to a previous state after blue screens during boot since at least Windows 8, and you could do it manually since at least Windows Vista.

The problem isn’t working around the problem (just rename or delete a single .sys file), it’s that this happened almost exclusively to massive companies with hundreds or thousands of computers. The fix itself takes maybe a minute, the problem is the massive amounts of work this requires to do across tens of thousands of computers.

Luckily, the quick solution seems to be “reboot the computer about 15 times so the automatic update that fixes the bug probably gets applied before the next crash”, but for systems where that doesn’t work, manual intervention is necessary.

slazer2au@lemmy.world · 1 year ago

It kinda is top of its class in endpoint detection and response software. A lot of cyber security insurance policies will demand you have some kind of EDR to be covered and seeing as Crowdstrike is one of the biggest names they get a lot of buyin from institutions and governments.

zorro@lemmy.world · 1 year ago

Or in other words, everyone else is complete shit.

slazer2au@lemmy.world · 1 year ago

No, it’s not a binary thing. There are other EDR products but they are the largest.

bushvin@lemmy.world · 1 year ago

What CrowdStrike is actually selling, is someone who actually looks at the system logs and who pushes a button when something pops up. Roughly.

There are better solutions on the market. Unfortunately CrowdStrike has the more aggressive sales team.

For those wondering, I’m referring to *nix based solutions like SElinux, appArmor, iptables, nftables, cgroups, … But you need to monitor your logs if you want to take appropriate action.

Skull giver@popplesburger.hilciferous.nl · 1 year ago

The problem with SELinux/nftables/cgroups is that they don’t come with a centralised log aggregator, and they don’t do much blocking beyond the defaults for 99% of deployments. Also, SELinux is a massive pain to set up (even compared to AppArmor), and setting it up correctly is even worse.

CrowdStrike does a lot of what SELinux does but it’s easier to configure, works on every operating system, and comes with tools to roll out configuration across an organisation. There’s nothing close to that in the open source world. Even if you set up something yourself, you’ll need to continuously tweak your setup not to get in the way of employees and to prevent alert fatigue from all of the false positives.

I think a preconfigured solution like Security Onion combined with tons of group policy and Ansible can form an open source alternative, but that only monitors, whereas CrowdStrike also blocks. To block behaviour, you’ll need to write code for most platforms, and that’s just as likely to take down your org as an auto update from CrowdStrike.

bushvin@lemmy.world · 1 year ago

The problem with SELinux/nftables/cgroups is that they don’t come with a centralised log aggregator, and they don’t do much blocking beyond the defaults for 99% of deployments.

You must not have heard of ®syslog.

Also, SELinux is a massive pain to set up (even compared to AppArmor), and setting it up correctly is even worse.

I beg to differ, I find SELinux easy to setup. But your mileage may vary, depending on one’s experience.

CrowdStrike does a lot of what SELinux does but it’s easier to configure, works on every operating system, and comes with tools to roll out configuration across an organisation. There’s nothing close to that in the open source world. Even if you set up something yourself, you’ll need to continuously tweak your setup not to get in the way of employees and to prevent alert fatigue from all of the false positives. Apparently, recent events show it doesn’t work on every OS… 😜

When talking about ease of use… Configuration is configuration. If you do not take the time to learn how to use your product, the product you know will always be better than the one you don’t. I’ve used Crowdstrike. I’ve battled them to get their kernel modules signing certificate to be signed by RedHat. I’ve battled them to have the possibility to have the auto update disabled. So no, I am not impressed by the quality of their product. I’ll bet any day a vanilla RHEL with the correct security related software and the latest updates outperforms and outclasses Crowdstrike.

I think a preconfigured solution like Security Onion combined with tons of group policy and Ansible can form an open source alternative, but that only monitors, whereas CrowdStrike also blocks. To block behaviour, you’ll need to write code for most platforms, and that’s just as likely to take down your org as an auto update from CrowdStrike. I can’t speak of MS products, as I have not managed them for 20 years, but all of this is not needed on a decent Linux distro.

mosiacmango@lemm.ee · edit-2 1 year ago

No, but yes.

Crowdstrike was one of the first companies doing EDR, and have a first mover advantage they have held onto. Lots of other companies offer good solutions now, but crowdstrike is still considered the gold standard, and they have worked hard to become the “default” for their market segment.

Brkdncr@lemmy.world · 1 year ago

Crowdstrike marketed to c-suites better than the others.

RobotToaster@mander.xyz · 1 year ago

A lot of companies install it for compliance checkboxing.

gazby@lemmy.dbzer0.com · 1 year ago

Apart from fjordbasa’s caveat RE “ubiquity” above, this is probably the most succinct answer 😐

NutWrench@lemmy.ml · 1 year ago

When an operating system allows a single misbehaving program to take down the whole computer and leave it unbootable. I thought we left that behind with Windows 95.

Skull giver@popplesburger.hilciferous.nl · 1 year ago

I think this is part of the reason Apple is trying real hard to prevent people from loading third party drivers. While that means a lot of hardware simply won’t work on their machines, at least a bug can’t cause a kernel panic.

As long as third party software is allowed to be loaded into the kernel (drivers, HALs, filters) we’ll have bluescreens created by applications. You can go without third party drivers, you just won’t be able to game on your computer, or run any antivirus software that wasn’t made by your OS vendor, or use any USB peripheral more complicated than a keyboard, or use WiFi.

sparky@lemmy.federate.cc@lemmy.federate.cc · 1 year ago

Apple is introducing a lot of user space frameworks to replace much of the kext level functionality though.

Skull giver@popplesburger.hilciferous.nl · 1 year ago

They are, but many if them don’t provide the same abilities or functionality that the kernel level interfaces did. For example, their network filtering/firewall API had (has?) a design flaw that allowed Apple’s software to bypass any attempts to block traffic.

Windows does the same, and Linux is slowly moving towards running more stuff in user space as well, but there’s no way to run something like CrowdStrike without low level access, at least not without crippling its capabilities.

Catsrules@lemmy.ml · 1 year ago

That has been a thing forever. I doubt it will ever go away.

Skull giver@popplesburger.hilciferous.nl · 1 year ago

Operating systems are moving as much software out of the low level kernel space as they can. On Windows, the entire GPU driver can crash and the OS will just flash a black screen and recover. Your games and browser probably go down with the driver, but that important Word document you had open in the background will survive.

In this case, there’s no way to implement the features at hand anywhere but deep down at the kernel level. It’s like anticheat but except for intercepting cheating software it’s intercepting all software that looks a bit suspicious. There are ways to protect against this (running applications in a virtual machine with a microkernel of their own for instance) but in practice this won’t work for the type of user Windows mostly serves.

As long as software like CrowdStrike is necessary, we run the risk of this stuff crashing. However, the impact doesn’t need to be this high; the reason everything went to shit is that every company installed this one piece of software onto their critical machines, rather than diversifying and having two different vendors. They probably don’t want twice the management overhead and twice the price, but they could’ve gone with a competitor on half their systems and only have half their services crash.

Bobby Turkalino@lemmy.yachts · 1 year ago

Drivers usually run in kernel space, where a crash can bring the whole system down. This is not exclusive to Windows

Riskable@programming.dev · 1 year ago

Yes but only in Windows land do you see jillions of (proprietary) drivers made by 3rd parties. Many of which self-update.

wewbull@feddit.uk · 1 year ago

This isn’t a driver. It’s anti-malware. Nobody on Linux puts such software in kernel space (as far as I’m aware). Root service? maybe, but that’s still a user-space process.

wizardbeard@lemmy.dbzer0.com · edit-2 1 year ago

It is a driver though, it runs at kernel level and intercepts system calls for logging, analysis, and potential blocking if malware type patterns are detected in the system calls.

remotelove@lemmy.ca · 1 year ago

It’s one of the better EDR (Endpoint Detection and Response) tools on the market. For enterprises, they are able to suck down tons of system activities and provide alerting for security teams.

For detection, when I say “tons of data”, I mean it. Any background logs related to network activity, filesystem activity, command line info, service info, service actions and much more for every endpoint in an organization.

The response component can block execution of apps or completely isolate an endpoint if it is compromised, only allowing access by security staff.

Because Crowdstrike can (kind of) handle that much data and still be able to run rule checks while also providing SOC services makes them a common choice for enterprises.

The problem is that EDR tools need to run at the kernel level (or at a very high permission level) to be able to read that type data and also block it. This increases the risk of catastrophic problems if specific drivers are blocked by another kind of anti-malware service.

When you look at how EDR tools function, there is little difference between them and well written malware.

Crowdstrike became a choice recently for many companies that got fucked over by Broadcom buying VMWare. VMWare owned another tool, Carbon Black, which became subject to the fuckery of Broadcom so more companies scrambled to Crowdstrike recently.

I hope that was enough of a summary.

WanderingVentra@lemm.ee · 1 year ago

What’s SOC services?

PolarisFx@lemmy.dbzer0.com · 1 year ago

Security Operations Center

WanderingVentra@lemm.ee · 1 year ago

Thanks!

shalafi@lemmy.world · 1 year ago

Security and compliance. It’s a certification that you’re following best practices, IT and otherwise.

remotelove@lemmy.ca · 1 year ago

That is SOC2. In this context, it’s Security Operations Center.

WanderingVentra@lemm.ee · edit-2 1 year ago

Thanks!

wizardbeard@lemmy.dbzer0.com · 1 year ago

Don’t forget the Superbowl ad and a ton of money put into marketing. It’s not surprising that it attaracted the attention of executives looking for something to tick an audit checkbox.

Ben Hur Horse Race@lemm.ee · 1 year ago

it was not, go on

pr06lefs@lemmy.ml · 1 year ago

I assume “endpoint” here means a computer that is on the network?

Dran@lemmy.world · 1 year ago

Endpoint is any PC/laptop/sign/POS/etc. It’s a catchall term for anything that isn’t a server. it basically refers to any machine that might be logged into and used by a non-IT user.

floquant@lemmy.dbzer0.com · 1 year ago

A computer that is used by a user, aka “not a server”

pastermil@sh.itjust.works · 1 year ago

More than enough! Thanks :)

Skull giver@popplesburger.hilciferous.nl · 1 year ago

Huh, I didn’t catch Carbon Black getting all Broadcom’d to shit. That explains a lot.

polle@feddit.org · 1 year ago

Thanks!