cam_servers ([info]cam_servers) wrote,
@ 2005-06-30 21:33:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
an update!
some of you may not be aware, but yes - cammail is still down! It can't come up enough for me to get it back. All I need is a shell...

tomorrow, I'm going to walk WW through getting a livecd working enough to get me back on the box, even if it means I won't be on the box's native kernel. So hopefully, tomorrow, it will be back.

All from a power loss that lasted overnight, or at least several hours? So in this case, a UPS wouldn't have helped.



(33 comments) - (Post a new comment)


[info]halaku
2005-06-30 07:24 pm UTC (link)
Then what would have helped?

Or, to be more constructive, what steps are being taken to make sure that a overnight power loss doesn't totally hose our systems for days on end, ever again?

(Reply to this) (Thread)


[info]delwin
2005-06-30 08:05 pm UTC (link)
shutdown now

(Reply to this) (Parent)(Thread)


[info]dazed1
2005-07-01 04:49 am UTC (link)
yes, a working ups could have been monitored by the system, which could then have been set to automatically shut down the system if the power didn't come back on after X amount of time.

(Reply to this) (Parent)(Thread)


[info]delwin
2005-07-01 05:13 am UTC (link)
I wasn't even thinking of doing it automatically - was thinking of walking over to the machine, logging in as root, adn typing 'shutdown now' and pressing enter.

That's all it takes.

(Reply to this) (Parent)(Thread)


[info]dazed1
2005-07-01 05:40 am UTC (link)
well yeah, that too ;)

(Reply to this) (Parent)

nuclear power
[info]methuse
2005-06-30 08:08 pm UTC (link)
we're looking into installing a small nuclear generator in the WW offices to provide a constant stream of cool, constant power.

the only thing stopping us now is meltdowns.


oh wait...

we're talking about it, but without constant 24 hour a day network admin at the location, dunno if anything will be 100%.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]nyterose
2005-06-30 11:18 pm UTC (link)
There's something wrong with a system that won't recover from a single dirty shutdown. It doesn't matter how long the outage lasted; once the server went down, it obviously stayed down, and won't come back up.

Lights-out managment boards are not signifigantly expensive, and would allow for remote management of a system even if it won't progress past bios. I would also suggest looking into the reason why a single outage corrupted the filesystem to the point that you can no longer boot the server.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-01 04:40 am UTC (link)
SINGLE?

you haven't been paying attention, have you. This is far, FAR from a single occurance.

Secondly, a hundred "dirty shutdowns" are no different than a single one - it's a lottery. each one has no more chance than any other to be the jackpot, but if you buy enough tickets, you'll win.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]nyterose
2005-07-01 04:07 pm UTC (link)
On a clean, properly maintained system, random dirty shutdowns are no such thing. Instead, they are recoverable. It's irresponsible to blame chance for a broken system.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-01 04:45 pm UTC (link)
thanks, armchair. When you've got hundreds of open files, and constant abrupt power losses, there's a chance. Next time you're contracted out at $275/hr with no complaints from the customer, I'll let you tell me what's irresponsible. I have multiple high level unix certifications from various vendors. Like I said, I have less than an hour combined downtime in the past year on the nearly 100 boxes out there that are similar in scope and type to these two. These two take up more of my time than the rest of them combined, as well. Thanks for your input though.

Notice the email moving now? Gosh, what does that mean? It means I recovered (and it didn't take terribly long once I had a shell...most the time was spent just checking filesystems and repairing them). I couldn't recover if the damn thing wouldn't boot on its own. Given lots of abrupt power losses, some will cause problems. Its a lottery. Did I say anywhere it wasn't recoverable? I just said it wouldn't *boot*. I'm 1000+ miles from the machine. I can't type a password to enter maint mode or whatever else it might be waiting on when it starts up.

A shell is all I need(ed). It recovered.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]nyterose
2005-07-01 08:52 pm UTC (link)
Hey, guess what I do for a living? That's right; same damn thing you do. And my systems don't crash because of a power outage, and when they do, I don't blame random chance. If I did that to one of my clients, I'd be fired, and rightfully so.

Several vendors make Lights-Out boards that are accessable from the internet; I know because I install, use and sell them myself quite regularly. Being half-way across the planet is no excuse either. There is no "lottery".

Congrats on getting the email moving again, however.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-01 09:08 pm UTC (link)
what part of "I have no control over the box" do you not understand? I cannot install anything. If I were there, a simple $5 cable would do everything I need. I'm not there. I cannot install anything into the box, nor do anything that requires any physical contact with the system.

I sincerely doubt you have any idea what you are talking about if you don't think that there's always a chance to corrupt data during an abrupt power loss. Nothing is random, but I don't control the power outages there either.

Again - I never said it couldn't be fixed, jsut that it wouldn't boot on its own anymore.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]kyrthira
2005-07-02 12:01 am UTC (link)
http://www.domsys.com/ -- Dominant Systems Corp.

http://www.domsys.com/about-dominant-systems.htm -- Check the fourth name down. I'll even cut and paste for you.
Robb Keefer, ISenior Network Engineer
Robb has been with Dominant Systems since 2003. He has been installing, troubleshooting and maintaining servers, networks and PCs for over 6 years. He also holds a Confidential security clearance for previous work with the Department of Defense. His certifications include Microsoft MCSA with a specialization in Messaging, and Microsoft MCSE in Windows Server 2003. Robb supports Windows, Linux, Unix, and Cisco.

Oh yes. I think he does know what he's talking about.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-02 05:51 am UTC (link)
whoopdy-freakin-do. 6 years ago, I was already well into my career as a unix admin. "confidential," you say? Gosh, I'm jealous.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]nyterose
2005-07-02 07:38 am UTC (link)
That's a long time to pretend you know how to administer a running system. You must be very proud.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-02 04:45 pm UTC (link)
and you must be what, 16 to be this much of a witless prick? That's a long time after elementary to pretend you know how to read, since you have had such obvious problems with it here.

(Reply to this) (Parent)

Re: nuclear power
[info]dazed1
2005-07-01 09:09 pm UTC (link)
and no, you don't do the same thing I do. Not if you know as little as you apparently know.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]nyterose
2005-07-01 11:15 pm UTC (link)
You're right; I keep my servers running.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-02 05:47 am UTC (link)
you install cards from 1,000 miles away, and you keep a system running even without electricity. I admit, you're pretty damn amazing.

(Reply to this) (Parent)

Re: nuclear power
[info]dazed1
2005-07-01 09:11 pm UTC (link)
ok, one more. you used to install, use, and sell lights-out boards that could be installed from 1,000 miles away. That is an impressive lightsout board, I'll admit.

I, on the other hand, am not in sales. I am an Oracle DBA. I build beowulf clusters, HA clusters, and general big-iron boxes. No sales here.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]nyterose
2005-07-01 11:16 pm UTC (link)
Yes, yes I do. HP sells 'em, so does Sun. So does IBM. Maybe even Gateway, for all I know.

Which Fortune 500 companies do you work with? I need to know who not to give money to.

(Reply to this) (Parent)(Thread)

Re: nuclear power
[info]dazed1
2005-07-02 05:49 am UTC (link)
let's try this again.

so we have a couple very cheap boxes with athlon xp's. They're 1,000 miles from me.

Now...explain to me again how I install anything in them from here? How I hook up a simply null modem cable, which would solve all my problems?

(Reply to this) (Parent)

Re: nuclear power
[info]dazed1
2005-07-01 05:11 am UTC (link)
with electricity, I could easily do 5 9's :P The combined total downtime in the past year for the nearly 100 systems out there that I remotely administer is less than an hour if you don't count those two @%^!&#$%&^!*!^!$%^!#$%^ things. And somehow those 2 take considerably more administration time than the rest of them combined, as well. AND they're the only 2 boxes I don't get paid for working on. They are like my banes.

(Reply to this) (Parent)


[info]ndemeter
2005-07-01 01:51 am UTC (link)
What's wrong with plugging a null modem cable from the UPS to the box and running PowerChute on it? That would ensure a graceful shutdown and good integrity of the FS. Once again, I am offering my help if you guys want it for this.

(Reply to this) (Thread)


[info]dazed1
2005-07-01 04:42 am UTC (link)
I tried to get that done remotely (had them buy the stuff, connect it, etc) but the ports never saw anything on the other end. Were I there in person, that would have been done a year ago.

(Reply to this) (Parent)(Thread)


[info]delwin
2005-07-01 05:15 am UTC (link)
would it be cost effective for WW to fly you to Atlanta for a weekend to do this? Maybe not now given things in your personal life, but you're not the only Linux admin in the Cam.

(before anyone asks, no, I can't do this effeciently - I'm a Windows Admin)

(Reply to this) (Parent)(Thread)


[info]dazed1
2005-07-01 05:42 am UTC (link)
I could walk anyone through it that had a phone that could reach the box :) I could walk them through it blind (ie, while i was driving, shopping, clubbing, or anything else not involving a computer in front of me) too. Problem is that I get home just barely before WW leaves for the day :/

(Reply to this) (Parent)(Thread)


[info]dazed1
2005-07-01 05:42 am UTC (link)
that is to say, it shouldn't be that difficult

(Reply to this) (Parent)


[info]delwin
2005-07-01 05:43 am UTC (link)
So no one can stay late one day to get this whole thing fixed?? Come on the mail server's been down for over a week now this is getting rediculous.

(Reply to this) (Parent)(Thread)


[info]dazed1
2005-07-01 06:02 am UTC (link)
I think it went down monday, so it's not been over a week. It's been a *long time*, but...not over a week.

(Reply to this) (Parent)(Thread)


[info]delwin
2005-07-01 06:09 am UTC (link)
*sigh*

point conceded - Monday. OK, it's been down a Business Week :P

Still that's no excuse. I guess I'll shut up for a while before I start ranting even more. Things will get fixed eventually and I can only hope and pray that someone somewhere finally gets this system stable...

You know, like put it in a NOC where there's 24/7 tech support. You do realize that if you don't get it up today it's down until at least next tuesday? that'll be over a week.

(Reply to this) (Parent)


[info]angelsrespite
2005-07-01 08:16 am UTC (link)
Thanks for working so hard.

(Reply to this) (Thread)


[info]cam_servers
2005-07-01 02:05 pm UTC (link)
'tis for your kind comment, and Chris's restraint in screaming (since I know hw wants too....heh), that I dedicate my evening to cammail's renewal.

(Reply to this) (Parent)


(33 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…