Supporting Namespaces in cgit

Richard Maw richard.maw at gmail.com
Sat Jun 25 17:46:26 CEST 2016


Hi all.

I thought I'd give an update,
since I managed to find time to make an attempt at this.

On Tue, May 10, 2016 at 02:21:36PM +0100, John Keeping wrote:
> On Mon, May 09, 2016 at 10:54:44PM +0100, Daniel Silverstone wrote:
> > On Mon, May 09, 2016 at 22:31:37 +0100, John Keeping wrote:
> > > Implementation-wise, it looks like using a namespace should just be a
> > > matter of setting GIT_NAMESPACE in the environment near the top of
> > > cgit.c::prepare_repo_cmd().
> > 
> > This is certainly the basic starting point.
> > 
> > > Discovering namespaces is more interesting, since we can't know what
> > > exactly is a namespace.  For example, if we have:
> > > 
> > > 	refs/namespaces/foo/bar/baz
> > > 
> > > is the namespace "foo" or "foo/bar"?  Maybe checking for "heads" and
> > > "tags" subdirectories is enough, but I'm not familiar enough with
> > > namespaces to know if those will definitely exist, and obviously users
> > > can create or delete any directories anywhere in the hierarchy.

Having looked at it in more detail,
I'd check for the presence of a HEAD symbolic ref,
if we were going to try it.

> > I'd not attempt to discover namespaces.  I think if you're given a namespace to
> > use in the repo stanza you use it, otherwise current behaviour prevails.
> > 
> > > Also, any attempt to discover namespaces during automated repository
> > > discovery (i.e. cgitrc's "scan-tree") is likely to be quite expensive
> > > with reading packed-refs and the whole loose refs tree.  However, it
> > > sounds like Gitano probably generates an explicit repository list, in
> > > which case a "repo.namespace" config key should be usable.
> > 
> > Yes, that's the intended behaviour.  I wouldn't expect cgit to be able to
> > invent namespace understanding out of nothing.

We have since discussed the idea of potentially having
a global namespace option,
so you could have a CGit instance that always displays the "docs" namespace
if you wanted to use it as a web-viewer for documentation served from git
without exposing the code.

> > > If we can indeed ignore any attempt to discover namespaces and just use
> > > "repo.namespace", is it enough to add that config value to
> > > "struct cgit_repo" and then pass it to setenv() in prepare_repo_cmd()?
> > 
> > This is a necessary start, but it is not sufficient.  Elsewhere in the codebase
> > changes will need to be made to use namespace aware ref iteration among other
> > things.  In addition, if we wish to support agefile per-namespace then we need
> > a repo.agefile option which can override the global option.  There may be more
> > but right now I don't have them to mind because I've not fully scoured the
> > codebase.

I think we can get away without having an option for this
if we instead deterministically mangle the file path with the namespace,
which has the advantage of it not being breakable by misconfiguration,
but since a global agefile option exists already
it might still be appropriate to add this as a configuration option.

> Ah, right.  I thought git.git's infrastructure might take care of
> namespaces automatically, but only git-upload-pack and git-receive-pack
> actually make use of namespaces so we'll have to do it ourselves.
> 
> Apart from enumeration, which should be fairly mechanical with
> strip_namespace(), we'll need to prefix user-provided values with the
> namespace.  I think the three relevant parameters (in
> cgit.c::querystring_cb()) are "h", "id" and "id2"; currently we allow
> each of those to contain either a named ref or a raw SHA-1, although we
> generate only named refs for "h" and only SHA-1s for "id" and "id2".
> And in fact ui-blob.c enforces that "id" contains a valid SHA-1.
> 
> So a simple implementation would just prefix "h" with
> get_git_namespace() and call it done, but that risks information leakage
> via "id" which is treated equivalently in most places (although as
> gitnamespaces(7) points out anyone with write access to the repository
> can already read whatever they want and in fact CGit imposes no access
> checks if you give it a SHA-1, but at least that's slightly more obscure
> than a ref name).

Yeah, namespaces aren't useful for security,
but we think they would be useful for preventing accidental leakage
or confusion.

Anyone can read the contents of the admin ref of a Gitano repository
that they have read access to,
but they may be prevented from pushing to it,
and its presence may just cause confusion for users who don't need to use it.

More about keeping the refs tidy, than keeping their contents secure.

> One approach to that would be to switch all the sites using "id" or
> "id2" to get_sha1_hex() but I'm sure we have people generating URLs
> using those parameters and relying on at least "id2" taking a ref rather
> than a raw SHA-1.  I suspect it is simpler to replace calls to
> get_sha1() with cgit_get_sha1() and apply the namespace prefix there if
> the value is not a raw SHA-1.

get_sha1() can also handle some of the weirder "refs",
which aren't a sha1 and don't start with a ref or branch name.

@{n} for reflog entries, :/ for searching for a commit in any ref,
or even the weird output of `git describe`.

Partially parsing the ref to find out which bit is the ref name
sounds like it is likely to be fragile when new forms are added,
so I'm tempted to have `cgit_get_sha1()` default to calling `get_sha1()`,
but if we have a namespace it strictly only supports sha1s and simple refs.

Until and unless git starts being able to handle namespaces in `get_sha1()`,
I think this is the best we can hope for,
and I'll be adding a big fat comment to revisit it when git updates.

> > If you think it's worth our while implementing a proof-of-concept patch series
> > then we'll give it a go.  I'm quite excited about being able to do this because
> > it'll open up so many interesting options for me when Gitano can ACLs which are
> > namespace aware :-)

A lot of the changes are straight-forward,
just changing something to call a namespace-aware function,
or prepending the path.

The dumb-http endpoint looks rather easy to convert,
but it may be convenient to resurrect the patch for adding smart-transport
(https://lists.zx2c4.com/pipermail/cgit/2014-December/002311.html)
since it would be easier to use smart-http
by having CGit set the namespace before calling http-backend
than configuring your web server to set the variable,
and fewer places for it to get out of sync.

We intend that Gitano will generate a cgitrc snippet for the repositories,
so Gitano would be able to cope without CGit providing smart-http,
but it may be a big hurdle to users if there isn't another implementation.


We should also be able to support displaying git notes in namespaces,
though it's a bit of a pig,
since we need to disable the default note search paths
and add one just for the default location in the namespace,
and since those paths are interpreted as globs,
for reliability I think I need to escape the path.

This does raise the interesting question of whether
we should include the notes of any sub-namespaces of the repository
in the search path for commits in the current namespace,
since all those commits in the sub-namespace are reachable,
but if they also exist in your current namespace you might get unexpected notes.


I will be pushing my progress to https://git.gitano.org.uk/cgit.git/log/?h=richardmaw/namespaces
and will try to keep in touch with my progress as I make it.
Comments are welcome.


More information about the CGit mailing list