PDA

View Full Version : My recent downtime experiences


AlexanderT
04-15-2004, 07:57 AM
Hello,

I've been with FH for almost a year now and can only stress my general satisfaction with the support team and also with the hosting services.

Unfortunately, in the past few months, I've been experiencing random downtimes (mainly database outage) for various reasons. Since then, I've also been in contact with the support team.

Recent developments:
03/31 I received response from Haryono that the database server has been restarted, and that he was unsure of what caused the downtime. Then, 04/07, when another database outage occured, I received response from Alex that "the server as it thought it was out of memory (yet there was plenty of memory available)" which caused the downtime. Again no definite answer to why the downtime occured. To my surprise, he wasn't even aware of the previous outage (seems he was not in contact with Haryono, so I forwarded him my ticket with Haryono; I received no response since then).

Today, when I got to my office I received various mails regarding server outages at FH again. I compiled a list below. Note that the outages occured AFTER all scheduled maintainces occured.

Database trouble (connection failed / server shutdown in progress)
04/14/04 16:02 -0400 (EDT) to 04/14/04 16:27 -0400 (EDT)
04/15/04 22:47 -0400 (EDT) to ?
04/15/04 01:47 -0400 (EDT) to 04/15/04 02:57 -0400 (EDT)

Site down (www.mobileread.com: site not available, reported from siteuptime.com (3 different locations) and easymonitor.com)
04/15/04 02:17 -0000 to 04/15/04 02:47 -0000
04/15/04 05:00 -0000 to 04/15/04 06:57 -0000

Please inform me when you are planning to fix the various outages.
Looking back at the outages I reported in the recent past (including the one in Februrary where some user used up all mysql resources making the mysql database virtually inaccessible for over a week), I can only hope that you will be able to fix the problems as soon as possible.

I understand that it is not always easy to make a shared hosting environment always smoothly running, but since your services have always been superb in the past, I am absolutely confident that you will be able to return to the same quality standards again.

All the best,

Alexander

AlexanderT
04-15-2004, 08:54 AM
... it just happened again:


04/15/04 08:47 -0400 (EDT) to 04/15/04 08:50 -0400 (EDT)

MySQL error number: 1030 (Got error 12) -->
/usr/local/mysql-4.1/bin/perror 12
Error code 12: Cannot allocate memory

----------

Also, my site was down again from

04/15/04 13:01:03 -0000 to 04/15/04 13:28:00 -0000

Alex

FH-Dave
04-15-2004, 09:34 AM
Alexander,

Is your alerts related to database functioning? The web server was not down, as monitored by our internal and external (alertra, four different locations) monitoring (ping and HTTPB). But if you were having problem with mysql and your webpage was not loading, then your monitoring can detect an outage during this period.

The most common of mysql problems are caused by:
- There were too many rejected connections from a particular web server, thus the database server rejected all the connections. This will usually affect everybody on the same web server.
- You have already maxing out the maximum allowable connections. This affects only you.

There is nothing we do but to restart the mysql server or flush hosts on the mysql server.

The problems you experienced early this morning, as reported by some other customers, were due to some slow queries and the database server needed to be restarted. We are are still looking into the reported memory allocation problem.

Out of curiousity, which problem on February that causes MySQL to be unusable for a week? I am not aware of this one week problem, both from our internal/external monitoring nor from complaints/reports from other customers.

My admin will be working on the mysql server today to try tweaking mysql performance. There might be a brief interruption when mysql server is restarted.

AlexanderT
04-15-2004, 09:47 AM
Is your alerts related to database functioning? The web server was not down, as monitored by our internal and external (alertra, four different locations) monitoring (ping and HTTPB).
The alerts related to database functioning is a vBulletin function that sends the admin an email as soon as a mysql error occurs. So when the database server was down or unreachable, and when users tried to access my site, I received tons of emails from the vBulletin system.

But if you were having problem with mysql and your webpage was not loading, then your monitoring can detect an outage during this period.
The webpage outage (site not available) was sent to me from two different services. 1) siteuptime.com, which itself simulateously monitors my site from three different location: NY, SF, London/UK); 2) easymonitor.com. I don't think that the error 'site not available' occurs when the mysql server cannot load, as the site is still available and when accessing it, displays the mysql error.


The most common of mysql problems are caused by:
- There were too many rejected connections from a particular web server, thus the database server rejected all the connections. This will usually affect everybody on the same web server.
- You have already maxing out the maximum allowable connections. This affects only you.
The mysql problems that I've experienced in the past months were not present before. Also, my site activity did not increase, on the contrary, it has decreased.


The problems you experienced early this morning, as reported by some other customers, were due to some slow queries and the database server needed to be restarted. We are are still looking into the reported memory allocation problem.
Glad I am not alone. But please notice that this morning is not the first incidence of this problem.


Out of curiousity, which problem on February that causes MySQL to be unusable for a week? I am not aware of this one week problem, both from our internal/external monitoring nor from complaints/reports from other customers.
I had very long discussions with Alex in that time trying to pinpoint the problem. I can send you the copies of my tickets, if you don't have access to it. Ticket # 9494 "Slow page generation". Someone heavily taxed the server making it unreachable. It took FH some time to figure out the problem and to remove the user who caused the excess database queries from the shared hosting environment.

My admin will be working on the mysql server today to try tweaking mysql performance. There might be a brief interruption when mysql server is restarted.
Thanks for trying to take care of the problem.

My best regards,

Alex

FH-Dave
04-15-2004, 10:33 PM
The webpage outage (site not available) was sent to me from two different services. 1) siteuptime.com, which itself simulateously monitors my site from three different location: NY, SF, London/UK); 2) easymonitor.com. I don't think that the error 'site not available' occurs when the mysql server cannot load, as the site is still available and when accessing it, displays the mysql error.


There seems to be disagreement with the monitoring tools you use and what we use. Neither our internal monitoring and external monitoring (alertra, four diverse geographical locations) detected any web server outages in the past 10 days.

Here are the list of outages on your web server (graviton, 66.150.201.81) since 04/01/2004, as reported by alertra


Date Outages Uptime Downtime Other Uptime%
04/01/2004 2 23:39:54 00:20:06 00:00:00 98.604%
04/02/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/03/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/04/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/05/2004 1 22:49:55 01:10:05 00:00:00 95.133%
04/06/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/07/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/08/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/09/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/10/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/11/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/12/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/13/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/14/2004 0 24:00:00 00:00:00 00:00:00 100.000%
04/15/2004 0 00:18:20 00:00:00 00:00:00 100.000%


BTW, we set alertra to check for HTTP and FTP connection in 10 minutes interval and we have our internal monitoring set to check for every 1 minute interval. Honestly, I am not sure why the differences reported. But if we were to have HTTP downtime of close to 2.5 hours this morning, I can assure you that there would be plenty of customers yelling at us already.

But anyway, I may have gone out of topic. The mysql server seems to be behaving quite well today. We will keep monitoring the mysql server. Also, within the next 2-3 weeks, we will prepare a new dedicated mysql server to start offloading the load from the current mysql server.

AlexanderT
04-16-2004, 01:15 AM
There seems to be disagreement with the monitoring tools you use and what we use. Neither our internal monitoring and external monitoring (alertra, four diverse geographical locations) detected any web server outages in the past 10 days.
I believe you Dave. I think the best for future references would be if you could also sign up with siteuptime.com (it's free). Thanks for showing me the outages logs.

But anyway, I may have gone out of topic. The mysql server seems to be behaving quite well today. We will keep monitoring the mysql server. Also, within the next 2-3 weeks, we will prepare a new dedicated mysql server to start offloading the load from the current mysql server.
That sounds great. The database server has been my real concern actually. Looking forward to those changes.

I want to stress that despite the tone of my initial post, I am still more than happy with FH (if I wasn't, I could as well give up on finding the right host ;) ). It was only those recent database difficulties that gave me a bad headache.

My best regards,

Alex