179,854Messages
9,130Senders
30Years
342mboxes

← back to listing · view thread

From:
Brian Behlendorf
To:
Cc:
, , ,
Date:
Fri, 8 Mar 1996 21:24:08 -0800 (PST)
Subject:
(idm) bandwidth and capacity problems on hyperreal
Msg-Id:
<Pine.BSI.3.91.960308205929.1526B-100000@taz.hyperreal.com>
Mbox:
idm.9603.gz
This message is being posted to a couple of the larger mailing lists on hyperreal to which I am subscribed - if you are a list admin and want to post it to your list, then feel free, but there's no need to start a whole lot of discussion about it in public - please send any comments to me. Many of you have probably noted problems on hyperreal recently - the memory upgrade helped address some of the serious server performance problems, but in particular did not help bandwidth at all, and also did not help a problem with majordomo and mailing list delivery, with blank messages, multiple messages, missing messages, etc. To compound problems, I was out of town for most of this week, and did not get to see most of the problems while they were happening. So, here is a synopsis of the current state of things, and how they are being addressed. Bandwidth Right now as most of you know, Hyperreal shares two T1s to Sprintlink with: Organic, HotWired, Wired, Suck, BiancaTroll, bigbook.com, apache.org, and a bunch of other sites. Up until this last week the bandwidth picture wasn't pretty, but it was "alright" - at peaks during the day the link to sprintlink would be 75% loaded, and many times Sprintlink themselves would be flaky if not hosed. But it was manageable. Within the last week, though, two things have happened: www.levi.com, run by Organic, made it to Netscape's "what's cool!" page, and with its massive server-pushes, screamed to the tune of gigabytes per day over the connection. Secondly, bigbook.com, an Organic-related company, launched with a lot of press a yellow-pages service which has also been drawing a large number of hits. So, the T1's to the building are now maxed - for a while earlier today each T1 was delivering 1.49 mbits/sec, with the theoretical max of a T1 at 1.54 megabits/sec. Ugly! The solution: Organic is getting its two T1's in about two weeks. This has been in the planning stages for several months, which is the average time frame for getting new bandwidth. We are also getting two more T1's later to a different provider, allowing us some redundancy. This will happen soon, but not immediately. In the short term, the impact from Levi's is being turned down, but there will still be some significant lag to the system, so I recommend that people avoid even trying to read mail or do any other type of interactive communication with hyperreal. Cruising the web site is alright, but trying to write an email in pine is just impossible. I know, I was trying to do that from the IETF meeting. :) If the situation remains in the critical section, I may take action such as turning off the mail daemon on hyperreal in the middle of the day, or turning off immediate delivery for mailing lists, and then turning it on at night for later delivery. These are drastic, but temporary measures. Hell, I'm pushing for a T3. :) Mailing list problems Most of hyperreal's work load is based around the number of mail deliveries it does on a given day. On a slow day hyperreal delivers 20K messages. On a busy day it delivers well over 100K. This punishes the operating system at a pretty deep level - and the problem I've seen is that sendmail can't pass the message off to the program to do what it needs to do, generating an error message of "Cannot fork()". It has been difficult to track down the exact reason why it can't fork, and while I think I have finally nailed it down today (CHILD_MAX in the BSDI kernel config, I think the "daemon" user is exceeding it, for the techies out there) I'm still not sure. I will be upgrading to 2.1 this weekend, and finding more swap space, but this is an ongoing battle I thought I had solved when I bought more memory two weeks ago. Anyways, if you sent mail to hyperreal more than 12 hours ago and it has not shown up on the list yet (and mail usually gets delivered pretty quickly for you) then consider submitting it again, it may be lost. Hopefully this kernel config has fixed it, but we'll find out next week, since we usually don't hit this problem during the weekend. That's it. Brian