Ticket #1239 (closed defect: invalid)

Opened 16 months ago

Last modified 16 months ago

Problem with cleaning up "gone" bans

Reported by: xani Owned by: martin
Priority: normal Milestone:
Component: varnishd Version: 3.0.3
Severity: normal Keywords:
Cc: admin@…

Description (last modified by tfheen) (diff)

Our configuration looks like this: -Varnish serves application content via ESI -application server invalidates changed content -vast majority of invalidations are done by purge, small percentage by bans -on average we got < 1 ban/s -~10GB cache about 3/4 used, ~2 mil object.

After 3 days we had >170k bans in "gone" state and only few hundred active, basically none of bans were removed from list

ban config is:

        if (req.http.X-ban-regex) {
                ban("obj.http.x-hash ~ ^" + req.http.host + req.http.X-ban-regex + "$");
                error 200 "Banned";
        } else if (req.http.X-ban-single) {
                ban("obj.http.x-hash == " + req.http.host + req.http.X-ban-single);
                error 200 "Banned";
        }

and then x-hash is set to right value.

Attachments

total_bans.png Download (24.3 KB) - added by xani 16 months ago.
Total bans
objects.png Download (21.7 KB) - added by xani 16 months ago.
Total objects
ban_s.png Download (28.5 KB) - added by xani 16 months ago.
Bans per sec
bans.png Download (27.7 KB) - added by xani 16 months ago.
After hardware upgrade, before and after tuning
bans_default.png Download (25.0 KB) - added by xani 16 months ago.
After hardware upgrade, before tuning
bans_tuned.png Download (33.6 KB) - added by xani 16 months ago.
After hardware upgrade, after tuning
adiuminstall.jpg Download (10.9 KB) - added by domtheo 5 months ago.
 Been pink,  Glutera

Change History

Changed 16 months ago by xani

Total bans

Changed 16 months ago by xani

Total objects

Changed 16 months ago by xani

Bans per sec

comment:1 Changed 16 months ago by tfheen

  • Description modified (diff)

comment:2 Changed 16 months ago by martin

  • Owner set to martin

comment:3 follow-up: ↓ 4 Changed 16 months ago by martin

Hi,

Can you please confirm that you are using Varnish version 3.0.3, as there has been some ban related fixes in the latest release.

Also, could you send the output of the 'ban.list' Varnish CLI command? That can read by using the following varnishadm command and attaching it to this ticket: $ varnishadm ban.list > banlist

In general, for bans to trickle out you will have to be able to use the ban lurker efficiently. This involves never doing any req.* in your ban statements, and also tuning ban_lurker_sleep down might increase the number of bans evicted.

Regards, Martin Blix Grydeland

comment:4 in reply to: ↑ 3 Changed 16 months ago by xani

Replying to martin:

Hi,

Can you please confirm that you are using Varnish version 3.0.3, as there has been some ban related fixes in the latest release.

varnishd -V varnishd (varnish-3.0.3 revision 9e6a70f) Copyright (c) 2006 Verdens Gang AS Copyright (c) 2006-2011 Varnish Software AS

Also, could you send the output of the 'ban.list' Varnish CLI command? That can read by using the following varnishadm command and attaching it to this ticket: $ varnishadm ban.list > banlist

Generally we use custom hash and then application invalidates all 'single' objects on change via PURGE and uses bans only when there are multiple objects at once:

1357559121.860319    10 	obj.http.x-hash ~ ^www.example.com/myFinishedContestsPart/(323958_[0-9]*|995532_[0-9]*|1100676_[0-9]*|822814_[0-9]*|940584_[0-9]*|822815_[0-9]*|1643792_[0-9]*|1633448_[0-9]*|1391735_[0-9]*|1464036_[0-9]*|782937_[0-9]*|1298164_[0-9]*|50406_[0-9]*|601168_[0-9]*|950033_[0-9]*|1219820_[0-9]*|729463_[0-9]*|256725_[0-9]*|19451_[0-9]*|1582436_[0-9]*|1993872_[0-9]*|144043_[0-9]*|1489478_[0-9]*|1298292_[0-9]*|123232_[0-9]*|2028142_[0-9]*|1424632_[0-9]*|1871383_[0-9]*|41347_[0-9]*|970515_[0-9]*|338983_[0-9]*|944131_[0-9]*|1924520_[0-9]*|2120811_[0-9]*|1849157_[0-9]*|1943777_[0-9]*|851181_[0-9]*|2140555_[0-9]*|18448_[0-9]*|1117478_[0-9]*|896202_[0-9]*|1574681_[0-9]*|1602679_[0-9]*|1637469_[0-9]*|934361_[0-9]*|1747970_[0-9]*|1492240_[0-9]*|1552184_[0-9]*|1706651_[0-9]*|835658_[0-9]*|1903308_[0-9]*|536479_[0-9]*|2134623_[0-9]*|717577_[0-9]*|158809_[0-9]*|642043_[0-9]*|999823_[0-9]*|642305_[0-9]*|85134_[0-9]*|773701_[0-9]*|1003921_[0-9]*|190744_[0-9]*|1183795_[0-9]*|939368_[0-9]*|193287_[0-9]*|468871_[0-9]*|58543_[0-9]*|41294_[0-9]*|1773893_[0-9]*|943076_[0-9]*|696958_[0-9]*|1991626_[0-9]*|703557_[0-9]*|16385_[0-9]*|907590_[0-9]*|1384116_[0-9]*|522346_[0-9]*|893326_[0-9]*|2059269_[0-9]*|732145_[0-9]*|1974262_[0-9]*|419819_[0-9]*|54137_[0-9]*|461531_[0-9]*|1189809_[0-9]*|1131251_[0-9]*|713310_[0-9]*|1258696_[0-9]*|707201_[0-9]*|2141370_[0-9]*|731227_[0-9]*|1448380_[0-9]*|1591377_[0-9]*|1415121_[0-9]*|1365477_[0-9]*|870309_[0-9]*|1520227_[0-9]*|594450_[0-9]*|293565_[0-9]*|55434_[0-9]*|1301944_[0-9]*|1391469_[0-9]*|1637212_[0-9]*|1670274_[0-9]*|1552941_[0-9]*|6734_[0-9]*|1895203_[0-9]*|65122_[0-9]*|1220870_[0-9]*|1586014_[0-9]*|1138652_[0-9]*|1622922_[0-9]*|16985_[0-9]*|1626225_[0-9]*|1282841_[0-9]*|692748_[0-9]*|834571_[0-9]*|871574_[0-9]*|1640875_[0-9]*|1878207_[0-9]*|1468225_[0-9]*|150075_[0-9]*|49950_[0-9]*|1216392_[0-9]*|1782624_[0-9]*|1871290_[0-9]*|1526197_[0-9]*|871123_[0-9]*|1405228_[0-9]*|1278682_[0-9]*|104212_[0-9]*|2042762_[0-9]*|833512_[0-9]*|1202955_[0-9]*|63278_[0-9]*|1884993_[0-9]*|1803277_[0-9]*|2033938_[0-9]*|1570247_[0-9]*|1910831_[0-9]*|1426844_[0-9]*|1386118_[0-9]*|1921870_[0-9]*|1092281_[0-9]*|1118977_[0-9]*|665477_[0-9]*|1290590_[0-9]*|162392_[0-9]*|827200_[0-9]*|2114_[0-9]*|1315011_[0-9]*|1312807_[0-9]*|1036033_[0-9]*|1812164_[0-9]*|236316_[0-9]*|1717238_[0-9]*|88257_[0-9]*|1714282_[0-9]*|1968966_[0-9]*|68881_[0-9]*|1469760_[0-9]*|1908724_[0-9]*|692105_[0-9]*|1527778_[0-9]*|2053489_[0-9]*|1069223_[0-9]*|1534927_[0-9]*|1370497_[0-9]*|1049822_[0-9]*|1353840_[0-9]*|1816028_[0-9]*|859566_[0-9]*|2085934_[0-9]*|962973_[0-9]*|1996184_[0-9]*|1266104_[0-9]*|199439_[0-9]*|1118601_[0-9]*|1581741_[0-9]*|1279431_[0-9]*|1055396_[0-9]*)$

or single article ID (about 99% of all bans):

1357559069.003583  1443 	obj.http.x-hash ~ ^www.example.com/newsCommentList/(91940.*)$

or

obj.http.x-hash ~ ^www.example.com/finishedContestsAjaxPart/.*$

In general, for bans to trickle out you will have to be able to use the ban lurker efficiently. This involves never doing any req.* in your ban statements, and also tuning ban_lurker_sleep down might increase the number of bans evicted.

We dont do any req.* except of occasional manual one when developer screws something up, all bans in VCL are based on obj.*

It seems it's related to CPU load, we tried tuning sleep down to 0.001 and it didn't help, since then we upgraded to more powerful machines (on old ones (4 core machines) load was around 80-100% in peaks, on new ones (8 cores) its about 30%) and (average ~0.5 bans/sec):

  • with defaults it hovers around 4.6k goneAfter hardware upgrade, before tuning
  • after tuning lurker to 0.001 its around 500After hardware upgrade, after tuning
  • Those sharp fallofs are not drops in ban list but restarts (we were tuning memory)After hardware upgrade, before and after tuning
Last edited 16 months ago by xani (previous) (diff)

Changed 16 months ago by xani

After hardware upgrade, before and after tuning

Changed 16 months ago by xani

After hardware upgrade, before tuning

Changed 16 months ago by xani

After hardware upgrade, after tuning

comment:5 Changed 16 months ago by xani

It seems this behaviour is triggered by having any req.* based ban on ban list (even if its one and all other are obj.* bans), when I manually added one bans started to not be removed and as soon as ban was removed by TTL ban list got cleared up

comment:6 Changed 16 months ago by martin

  • Status changed from new to closed
  • Resolution set to invalid

Hi,

Ban list length starting to accumulate when you have req.* parts in your ban lists is very much the expected behavior. Due to the way ban lists are implemented, they can only ever be removed off the tail of the list. The ban lurker's responsibility is to work on the tail of the list in order to free the bans there. But the moment a req.* ban is at the tail, the ban lurker can't do anything with it (the ban lurker obviously is not running in the context of a request, thus can't match anything on req.*). So the ban list length then will grow indefinitely until that ban clears any other way (objects it's linked to is actually requested or their TTL elapses). Solution is to not use ban expressions with req.* in them.

I'll close this ticket as invalid.

Regards, Martin Blix Grydeland

Changed 5 months ago by domtheo

Note: See TracTickets for help on using tickets.