RFE: .so filters

Thu Jan 9 23:58:02 CET 2014

On Thu, Jan 09, 2014 at 10:34:26PM +0100, Jason A. Donenfeld wrote:
> I'm thinking about this filtering situation w.r.t. gravatar and
> potentially running multiple filters on one page. Something I've been
> considering is implementing a simple dlopen() mechanism for filters,
> if the filter filename starts with "soname:" or "lib:" or similar, so
> as to avoid the fork()ing and exec()ing we currently have, for high
> frequency filters. The idea is that first use of the filter would be
> dlopen()'d, but wouldn't be dlclose()'d until the end of the
> processing. This way the same function could be used over and over
> again without significant penalty.
> 
> In my first thinking of this, the method of action would be the same
> as the current system -- "int filter_run(int argc, char *argv[])" is
> dlopen()'d, executed, and it reads and writes to the dup2()'d file
> descriptor. Unfortunately, the piping in this introduces a cost that
> I'd rather avoid. In the case of gravatar (or more generally, email
> author filters), we'd be better off with a "char *filter_run(int argc,
> char *argv[])", that can just return the string that the html
> functions will then print. This, however, breaks the current filtering
> paradigm, and might not be ideal for filters that enjoy a stream of
> data (such as source code filters). This distinction more or less
> points toward coming up with a library API of sorts, but I really
> really really don't want to add a full fledged plugin system. So this
> has me leaning toward the simpler first idea.
> 
> But I'm undecided at the moment. Comments and suggestions are most
> welcome.

That interface doesn't really match the way the current filters work.
Currently when we open a filter we replace cgit's stdout with a pipe
into the filter process, so none of the existing CGit code will work
with this interface.  We could swap out write with a function pointer
into the filter, but I don't think we guarantee that all of the data is
written in one go which makes life harder for filter writers (although
for simple cases like author info we probably could guarantee to write
it all at once).

If we allow filters to act incrementally, then we can just leave the
filter running and swap it in or out when required.  That would require
a single dup2 to make it work the same way that the filters currently
work.  Interestingly, there is an "htmlfd" variable in html.c but it is
never changed from STDOUT_FILENO; I wonder if that can be used or are
there other places (possibly in libgit.a code) that just use stdout, in
which case we should remove that variable.  But there is the problem of
terminating the response; Lukas' suggestion of using NUL for that may be
the best, it's not that hard to printf '\0' in shell.

OTOH, the particular case of author details the input is more clearly
defined than items for which we currently provide filters, so maybe it
could use a different interface.

One final point (although I don't think you're suggesting this) is that
we shouldn't require shared objects; I think scripts using stdin+stdout
are a much simpler interface and provides a much lower barrier to entry,
not least because the range of languages that can be used to implement
the filters is so much greater.