Since there are so few images on HN, there is no reason to have more than a couple connections per IP on port 80.
It will radically reduce your server load and there will be no blacklists/whitelists to maintain.
Unless there are dozens of people from the exact same IP and not an IP pool, it won't be a problem.
The worst case is they will see a longer delay for an initial page load in their browser by half a second. But it helps the server tremendously, especially since HN seems to use Apache.
HN serves with connection-close, not keep-alive, so as soon as one request is done, the connection is freed for the next visitor on the same IP. This would just force them to be in single file on a very quickly moving line instead of requiring dozens of connections to be served all at the same time.
Think of grocery store with one super-fast express lane vs no express lane and a dozen very slow cashiers and people with full carts ahead of you.
Don't knock connlimit until you try it. Again, it's not a ban, just backlogs the requests.
Even if you add a proposed delay, a user behind one of these NATted networks could (unintentionally, I hope) cause a DoS by sending lots of requests to make the queue unreasonably long, which, to someone behind the NAT, is just as bad as a server ban.
Server: Apache/2.2.22 (Ubuntu)
Actually looks like I am wrong.
Static objects are coming from Amazon while dynamic are coming from another server @theplanet.com
I would expect it to solve most of your performance problems for the foreseeable future (at the very least, by letting you scale horizontally and move the DB, frontends, and memcaches to separate boxes - plus ending memory leaks/etc by moving most of the data off the MzScheme heap).
The obvious downside is that it would use your (or someone at YC's) time. First to merge the changes I make to http://ycombinator.com/arc/arc3.tar into the production code, then to buy/setup some extra boxes and do the migration. We're probably talking, roughly, a day. It also has the unfortunate side effect of costing HN's src some of its pedagogical value, since it adds external dependencies and loses 'purity'.
Been looking for an excuse to learn arc for a while now ...
The site is very much hacked together, but works... In a lot of ways, this reflects the hacker ethos of getting something up and running quickly at low cost while still producing value.
A revamp might have negative impact too by attracting a wider, more mainstream audience which could possibly dilute the purity of the community here.
Careful now :) It's not like there's anything stopping HN attracting a wider audience anyway; there's no restriction on who can register. Anyone can come and join in, which (in my opinion) is as it should be.
Any engineer that has live code has made this mistake before.
My hope is that it will only take a day or so to deploy it, once it's ready.
I would agree that there is also little to no desire to make Hacker News "the news place" - where it supports thousands of posts a second and is extremely popular. In general Hacker News is used (and the hope is to stay that way) by startups and people interested in startups - it's slowly growing out to include more types of people - marketing, companies, blog posts who just want a lot of hits, etc - and not many people want to purposely support that.
And there are no webapp architectures which do this.
(My hobby: posting "nothing like X exists" in a Hacker News thread. :)
It seems like with the rise of 'zero copy' approaches we could do even better - simply designate a memory region as unsafe, and transform it into a safe version depending on which context it is used. These transforms would want to add a little metadata pointing to the original unsafe region in case the transformed region is ever subsequently used in a different execution context. Alas, from the perspective of one program the input to another always just looks like a string, which means that somehow our host program (and programmer) needs to signal the appropriate transform on, say, concatenation. The only way I can think of around this requirement is to force implementors of contexts to tag their interfaces as a context, and for callers to construct arguments to those functions such that constituents that derive from unsafe regions are detectable. For example we have a SQL context that takes an array of string pointers, where some of the pointers point to 'unsafe' regions, and we just concatenate the elements of the array to construct the context argument.
While the idea of "taint" is useful, it is only half the battle. The other half is accounting for the context.
Quote: "HNSearch was built by the team at ThriftDB to give back to the community and to test the capabilities of the ThriftDB flexible datastore with search built-in."
Interesting API all the same though.
edit: oh, official API is above. Disregard this one :-)
This is, quite possibly, the worst webapp I use on a regular basis.
The first and foremost reason for me to consult HN is because it is fast. I am in China, and usualy send time on the web only on my phone, with 3g connection.
HN speed beats all other link agregators, blogs, news site, and even goolgle search, and -- most interesting: even fast Chinese sites.
I don't know why it is so fast (except when it is dead, obviously), maybe because of this flat-file architecture, which could just make sense. (Git is very fast too, right?)
And I think it is interesting that the "make it fast" is a leitmotiv that has been forgotten by so many people, Google firstly, but is still a reason for some (me, at least) to pick this site over that sire.
It speeds up browser startup dramatically. Especially when you leave lots of tabs open as your "to read" list.
Firefox has a better solution for this but then again, I don't use firefox.
I only loaded so many pages because I love HN. :)
The ban was lifted a few days later, not sure if automatically or thanks to my (unresponded) email request.
It is triggered very quickly and it seems to last forever (maybe 15min would be better?).
I ask pg to kindly consider making it a bit more lenient.
I doubt HN goes under deliberate/malicious attacks, etc...
I'm making a HN extension that preloads some data such as the comments
and the links on the next page (it's still with reasonable delays).
But at the moment it's impossible for it to function without risking the user getting banned.
Pretty much any site with decent traffic is under constant attack, and the high profile of HN means it'll be under far more scrutiny than others.
[−]sunstone1 10 hours ago | link [dead]
Well I never had my IP banned but I did have my account hell banned after about a dozen posts as you can see. Oh, actually no, you can't see, because it's banned. No, I never bothered to get another account, now I'm just a taker not a giver.
Most of the time it's clear why a user was banned, but looking at sunstone's history I don't really see a reason. While the algorithm will never be perfect, it would be nice if there was a clearer solution for misfires.
It occurs to me that I would like to interact with noprocrast in a different manner. Currently, I leave noprocrast disabled most of the time. I like to use longish minaway times (~day), but this makes me feel as if my first visit to HN will start the clock ticking, and I'd better be sure to get my HN fill before the timer runs out (yes, this is kind of ridiculous). So I only enable noprocrast (with a short maxvisit) upon realizing I'm stuck in a web loop.
The mechanism that I envision is either a button that immediately starts a one-shot noprocrast ban, or a page-count based maxvisit. The latter might be better since it could always be left enabled.