I am currently out of town, and my server went down. All my services go through nginx, and suddenly started giving error 502. My SSH won’t let me in. I had my sister reboot the server, and it still doesn’t work. I apologize for the lack of details, but that is all I know, and I can’t access logs. I’ve cleared cache, and used a VPN in case fail2ban got me. I recently got a tp link router, so it could be something with that, but it was working for a while. I will have her do another reboot, and if that doesn’t work I will have her power off and unplug the server in case it was hacked.
Edit: I have absolutely no clue why, but it works now. I literally did nothing. As far as I know, my sister hasn’t touched it today. It just started working. Computers, man…
Edit 2: Actually she said she did something. Not sure what, but it works now.
Some troubleshooting thoughts:
What do you mean when you say SSH is “down”:
- connection refused (fail2ban’s activity could result in a connection refused, but a VPN should have avoided that problem, as you said)
- connection timeout. probably a failure at the port forwarding level.
- connection succeeded but closed; this can happen for a few reasons, such as the system is in an early boot up state. there’s usually a message in this case.
- connection succeeded but auth rejected. this can happen if your os failed to boot but came up in a fallback state of some kind.
Knowing which one of these it is can give you a lot more information about what’s wrong:
System can’t get past initial boot = Maybe your NAS is unplugged? Maybe your home DNS cache is down?
Connection refused = either fail2ban or possibly your home IP has moved and you’re trying to connect to somebody else’s computer? (nginx is very popular after all, it’s not impossible somebody else at your ISP has it running). This can also be a port forwarding failure = something’s wrong with your router.
Connection succeeded + closed is similar to “can’t get past initial boot”
Auth rejected might give you a fallback option if you can figure out a default username/password, although you should hope that’s not the case because it means anyone else can also get in when your system is in fallback.
Very few of these things are actually fixable remotely, btw. I suggest having your sister unplug everything related to your setup, one device at a time. Internet router, raspberry pi, NAS, your VM host, etc. Make sure to give them a minute to cool down. Hardware, particularly cheap hardware, tends to fail when it gets hot, and this can take a while to happen, and, well, it’s been hot.
Here’s a few things with a high likelihood of failing when you’re away from home:
- heat, as previously mentioned.
- running out of disk space. Maybe you’re logging too much, throw some more disk in there and tune down the logging. This can definitely affect SSH, and definitely won’t be fixed by a reboot.
- OOM failures (or other resource leaks). This isn’t likely to affect your bare metal ssh, but it could. Some things leak memory, and this can lead to cascading process destruction by the OS. In this scenario you’d probably be able to connect to things in the first few minutes after a reboot, though.
- shitty cabling. Sometimes stuff just falls out of the socket, if it wasn’t plugged in perfectly to begin with. (Heat can also contribute to this one.)
- reliance on a cloud service that’s currently down. (This can include: you didn’t pay the bill.) Hopefully your OS boot doesn’t fail due to a cloud service, but I’ve definitely seen setups that could.
running out of disk space
This would be my first guess. Nothing shuts down arbitrary services quite like a full
/var/logs
.I’ve got a 1tb boot drive and it isn’t used for much, but stuff happens, so… idk.
It says connection closed. There is no message beyond that. I think it is likely that it is failing to boot. I might video call my sister and have her try to boot it so I can see any errors.
Edit: Also, thanks very much for your response. It was very detailed and informative.
Connection closed means somebody is listening to the port and failing/not willing to reply.
Unless some network middlemen is closing your connection (ssh should be on port > 1024 to be safe from ISP throttling), your ssh server is severely strained (oom, disk full…) or your F2B is kicking in.
yw :)
If it’s working again all of the sudden I would lean towards f2b. I don’t know what your “timeout” is, but if f2b got tripped it would explain why you couldn’t get in yesterday but today it works (assuming your ban expires in 24hrs or so).
That sucks dude. Not much you can do about it remotely.
My sister is there, but I can’t do much diagnosis. It is weird that SSH would go down with it though, so I thought someone might have an idea.
502 means the app is broken. For example, if it were Flask python, it would be raising an exception (e.g. divide by zero). If this is happening to many services or apps simultaneously, it is concerning. Turning it off sounds wise at this point.
Yeah, I would think docker is broken, but that wouldn’t explain the SSH, which is bare metal and doesn’t go through nginx.
I know it’s bad gateway. I just don’t know what caused it, or why it happened when SSH went down. Thanks, though.
If ssh is down, and your proxy can’t talk to that same machine, then…
Proxy is on the same machine though. I just use it for subdomains and rate limiting.
So then…
So then… maybe try being direct with your answer.
I’m being direct. If your host isn’t answering it is…down
But it isn’t. It sends me an nginx error. The nginx is on that server, so that server isn’t completely down.
Does your router have an app or way of letting you remotely see if the server is even showing on your home network?? It could be a physical disconnect or Ethernet port failure, or NIC failure maybe? A reboot wouldn’t help if the issue was related to something like that.
Edit: Actually, re-read your post and thinking about this again, what I said wouldn’t make sense…
You could have some sort of corruption causing an error in the appdata, preventing it from running. Might be a RAM issue.
It has a network connection, I am able to get to the nginx error, the services themselves are down. What’s really weird is everything is down, even SSH.
I edited my original post right when you replied, my bad.
I dunno if you can do that much remotely, honestly. I kinda feel like something might have corrupted? What kinda system are you using? Any more details you can provide?
I don’t think it is a hardware issue. I have decent hardware that’s fairly new. I unfortunately can’t say much, though another commenter let me know the SSH failure message is relevant. It see connection closed, which means that it is probably failing to boot. I think an update or something may have broken it, though it is debian stable, so Idk. I’m going to try to call my sister and see if I can get a picture of an error message or something.
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters DNS Domain Name Service/System HTTP Hypertext Transfer Protocol, the Web IP Internet Protocol NAS Network-Attached Storage SSH Secure Shell for remote terminal access VPN Virtual Private Network nginx Popular HTTP server
6 acronyms in this thread; the most compressed thread commented on today has 11 acronyms.
[Thread #866 for this sub, first seen 12th Jul 2024, 22:25] [FAQ] [Full list] [Contact] [Source code]
if your sister’s by your server in-person, maybe you could guide them to graphically install something like Rustdesk (edit: graphical remote access, wayland isn’t well supported so make sure it’s running over Xorg), give you the access code & have them manually accept the connection so you can get back in.
You’ll be stuck streaming your terminal window and sending laggy keystrokes though whatever connection you have now (until you can get ssh running), but it’s better than nothing.
Are you by chance using something like Cloudflare? It may be possible that during the reboot the static IP changed, so your “gateway” cannot reach your router on your old IP no more?
In other words : it’s always the DNS?
It’s not the DNS. That was the first thing I checked. Also, I don’t use cloudflare.