rdbms as backend

Marcin Krol mrkafk at gmail.com
Wed Jul 31 10:51:45 CEST 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Leif!

W dniu 7/31/2013 02:45, Leif Pedersen pisze:
> Interesting. So you're saying that you need 30k rps to the DB, and
> the request rate to the front end of Varnish is much higher? If the
> Varnish front end is "only" getting 30k rps, then perhaps there's
> something you can do to improve the cacheability of objects.
> Otherwise, that's impressive even by my standards.

> You're sure your DB can keep up in a useful way with 30k rps?

That's not a single DB, that's *all the subsystems* for which we cache
the results: 30+ mysql and oracle instances (all replicated into
additional set of mirrors for HA course, 3 DBAs babysitting all this
rubbish), several other specialized subsystems. If you aggregate
traffic for all those, it's about 30K rps, more in peak hrs and under
some circumstances actually.

That those cannot keep up is precisely why we're developing caching
layer (actually we have one, but it's spaghetti C developed years ago,
basically unmaintainable by now). The load on those machines is high
at all times, as queries are not very expensive but they're not
trivially cheap either.


That's not
> the real bottleneck, is it? I've made that mistake myself, so I
> feel compelled to ask. :) If so, also quite impressive that your DB
> can keep up.
> 
> Have you tried implementing this sort of middleware in node.js?
> Sounds like you've used node.js a lot,

We haven't, that's the problem: we're a Python shop (with some C
skills available too). Other divisions have used node.js a lot (hence
that's the next solution under consideration as we can get some
assistance).

Still not good enough. Let's face it: caching server is what C / C++ /
binary-compiled static-typing high-performance system-programming
language is built for. In ideal world I'd build this thing in D
(that's unavailable for obvious "chicken" lack of solutions and "egg"
lack of human skills dilemma).


but not for this particular problem if I
> understand correctly. Perhaps it would be worth the experiment.
> 
> Here's a great read of WSGI servers with surprising performance 
> differences. Tornado looks okay, but there is better. 
> http://nichol.as/benchmark-of-python-web-servers

Erm, I do not want to sound ungrateful but that's exactly one of the
pages I started with...

My colleagues investigated gevent. It fell by the waysides, mostly bc
of some Python's C-based extensions leaking memory under so much load.

I have investigated FAPWS3 with surprisingly good results: 3K rps, no
leaking (at least in my application..). It's little known but works
suprisingly well. I'm not sure if credit goes to libev or good FAPWS3
coding but there it is.

Still, not good enough. Etc etc. Memory leaks, crashes, lots of failed
requests, etc etc. Heck I modified hello world example from Cowboy
(Erlang-based http framework) to do stuff like talking over memcached
protocol (another thing on working pile for this caching server) and
got 5K rps but it's not like Erlang programmers are available in
numbers and we're not going to hire one for this project alone. Oh well.

I thought: "what the heck I'll give varnish a try" (on caching
http-interfaced subsystem backend, we have some of those apart from
rdbmses). Result: 7K rps at 1K concurrent connections.

And stays approximately at this rate with increasing number of
concurrent connections, basically up to 5K concurrent connections and
grand total of 4 failed requests (0 at lower rates).

Woohoo!

Other solutions were basically crushed like bugs under this number of
concurrent connections. That's why you should not trust those blog
pages with high benchmark results: it's all fine and dandy to get
those serially. But if the solution has to handle large number of
concurrent connections at any moment - that's where things turn sour.
I had more than one high-serial-performance Python solution fail
miserably under such circumstances.

But now I have a problem: how to plug mysql or oracle into varnish?
I've done some C coding but replacing http-oriented backend handling
in varnish is a little ambitious for me.


> 
> It sounds like the middleware is really trivial, and compared to
> the entry bar for adapting Varnish, probably worth trying several
> WSGI servers and/or node.js if need be.
> 
> I am indeed intrigued by connecting directly to the DB, but you
> can probably see my skepticism between the lines here in bright
> orange. :)

If I understand you correctly, you would not connect to DB directly
either?

We tried that in the past (in small scale). It works for the moment
and that's the problem: WI you need to say, upgrade mysql? Your entire
client logic layer changes, in most/all of the infrastructure. Tight
coupling. Sucks.


> Seems like a much more flexible approach to use middleware, worth a
> bit of extra hardware...on the other hand, maybe not worth it if
> the cost difference really is 10x.

That's what we're trying to reduce: infrastructure, power, maintenance
costs.


Regards,
MK
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR+NAhAAoJEFMgHzhQQ7hOPfYH/iWRxNhVICFZw2I7F/hIUHTv
fL+vk4a6mXLGdK4S8tjDqo9xPXBLpdBHUmQiPqVLZouwz/zy+E15l0zteMcj08Qg
LPrq8/m9E2smcFvPwKTTVpUtq0VmE+MoZqq289VbLxxoxN8v9mwzPy2C/iDBwMu9
hp939RCBTkATJ7XP+ilXvumKsMhRFVCfbdkpbQbSjNifEiDYplwGLV4FheuMOa9F
T4k3M0kFu3BOJmqIvkGrtV5n8ygRtOEn+aK+C/Kq8M2wUsumsLHVnfk+KFmoASev
sh3fRK7AV0lhXoxNM6TG994UqzFZ9zh3yZFTPwYED1ck4VYX8/D1xWuGjrYzN8s=
=VHHc
-----END PGP SIGNATURE-----



More information about the varnish-dev mailing list