Problem with cgit cache

John Keeping john at keeping.me.uk
Mon Jan 19 20:50:12 CET 2015


On Mon, Jan 19, 2015 at 02:17:00PM -0500, Eclipse Webmaster (Denis Roy) wrote:
> We use cgit for about 800 Git repos. Lately we've noticed that the links 
> in the cache become polluted. We've noticed hits like this in the logs, 
> which come from Search Bots, which seem to match the garbage in the 
> cache links:
> 
> GET /c/set%7Cset%26set/org....
> 
> GET /c/%0aset%7cset%26set%0a/org....
> 
> (we serve cgit from /c/)
> 
> If I clear the cache entries, all is well until these bots come along 
> and pollute it again.  If I set cache-size=0 everything works well, 
> albeit much slower.
> 
> Is this a known bug in cgit?  For now I've added some Apache 
> RewriteRules so that these hits don't reach cgit, but it would be nice 
> if cgit could deal with these.
> 
> You can read more on our bug tracker, here:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=453438

Although you seem to have ruled it out, I think storing the cache on NFS
is likely to be problematic.

A quick search found some documentation [1], [2] on problems with
sendfile(2) and NFS.  You could try editing cgit.mk to comment out the
HAVE_LINUX_SENDFILE define, but I would recommend avoiding NFS for the
cache if possible.

I have tried a quick test and wasn't able to reproduce your error, but I
will try to find some time to investigate further and see if there is a
problem with certain requests.

[1] http://www.proftpd.org/docs/howto/Sendfile.html
[2] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html


More information about the CGit mailing list