[RFC] F-Droid source mirror refactoring

Forums Development [RFC] F-Droid source mirror refactoring

This topic contains 22 replies, has 3 voices, and was last updated by  pfalcon 1 year, 2 months ago.

Viewing 15 posts - 1 through 15 (of 23 total)
  • Author
    Posts
  • #1384

    pfalcon
    Member

    F-Droid repository and build script provide source tarball for each package. This tarball provides exact source code used to build a package (e.g. all F-Droid patches applied) and very good for open-source license compliance.

    Unfortunately, formal license compliance is probably the only thing that tarball is good at. It’s not so good for development for example, because anyone doing it should be upstream-aware, i.e. preferrably work with pristine upstream, and treat any patches in explicit, layered manner, no as a flattened tree.

    There’re however more uses for for the source mirroring than that. Having mirrored source helps to cut the time and traffic needed to rebuild package – no need to re-fetch it, just unpack and maybe update. It also helps with intermittent upstream repository downtime. Finally, it helps with upstream repository disappearing altogether – we have upstream content as is, and can re-establish it easily.

    So, what I’m calling here is to establish source mirror use as the part of build process, not as a kind of output. And I had large part of that implemented it in my fork for like a year now, though some aspects weren’t completely finished. And the whole matter needs to be discussed first. So, well, I have those patches regularly broken by upstream changes, so I finally would like to bring them into mainline if possible.

    So, here’re 2 points to consider:

    1. How such new source mirroring fits with existing output source tarball. I just added source mirror, and kept output source, but for official repository it’s apparently waste of disk space and traffic. So, would make sense to get rid of output tarball, and keep only pristine source tarball: but having those and F-Droid source code/metadata, anyone can rebuild any package.

    2. Subversion is a weird animal is the world of distributed VCS, it uses too adhoc features which doesn’t fit well with mirroring (for example, in-place version updates which require networked repository access). My solution to that was get rid of such special handling, and instead treating each package version branch as a separate repository, and allow to specify repo_type/repo options on build level (in addition to package level).

    #1389

    CiaranG
    Member

    Lots to say, but a few initial thoughts while I’m here…

    a) I think it’s important to have a tarball that exactly matches the built output, no matter what else exists. Someone shouldn’t have to run loads of scripts and patches against a repo to be able to get it. So I think that should always exist.
    b) I already got rid of the special handling (tags/*) for SVN, a long while ago.
    c) For svn, I now always use git-svn instead. (That applies to newly added apps, not old ones yet – I’m slowly converting them). It’s far more efficient.
    d) The build system already effectively ‘mirrors’ all repositories locally, since that’s what the locally cloned repos are, complete mirrors of the remote. The only exception to this would be svn, but see c).

    So, I *think* I’m struggling to see what’s missing from the current setup. Going back over the reasons….
    *”for development” – we don’t do development – for any kind of change other than a tiny patch to make a build work, surely the original repository is the place for that, or failing that, fork it (away from f-droid)
    *”cut out time and traffic” – see d)
    *”intermittent repository downtime” – see d), I think, if I understood that right
    *”upstream repository disappearing” – see d)

    Just thinking out loud, with my initial thoughts. I’ll probably come back to this again…

    #1392

    pfalcon
    Member

    a) I think it’s important to have a tarball that exactly matches the built output, no matter what else exists. Someone shouldn’t have to run loads of scripts and patches against a repo to be able to get it. So I think that should always exist.

    Yeah, I figured that. That’s one of the reason I didn’t submit my patches earlier – I imagined it would be a bit hard to argue why I propose to add another source tarball. Well, the best argument I have is that it’s how other projects have it – Debian, Gentoo, OpenEmbedded, etc. – all they whenever possible provide pristine source and patches separately, not flattened out patched tree. So, selecting one of the two, there seem to have a best practice here. But if you would be ok with both, I hardly can argue – that’s you who host the server and downloads, so if it’s ok for you to potentially serve twice more source traffic, then well, it’s ok. But talk is exactly to provide pristine source snapshot as part of F-Droid repository, so we had 100% build reproducibility (on the entire F-Droid repo level, not on the level of individual packages) regardless of any upstream repos downtime.

    c) For svn, I now always use git-svn instead. (That applies to newly added apps, not old ones yet – I’m slowly converting them). It’s far more efficient.

    Wow, that’s cool, I just came back to say that I had the same idea, didn’t know it is implemented (I didn’t have a look at f-droid for a while ;-( ).

    d) The build system already effectively ‘mirrors’ all repositories locally, since that’s what the locally cloned repos are, complete mirrors of the remote.

    Well, maybe there was more changes since I looked at build.py, but that’s not the case how it was before. It used to be that you checked out the code, and then immediately started to patch it, etc. Yes, you did reverts, etc. in case of need, but well, it essentially means that there was no clean upstream tree around most of the time, only patched one. Other keyword is “locally” – so yes, you have it locally on your machine, but that doesn’t have anyone else – if I start to build and some repo doesn’t fetch, I cannot reproduce complete F-Droid build. And the talk is well, about being able to rebuild F-Droid repository from scratch.

    But you’re right that it’s a farily small change – just need to cache repository into a tarball immediately after checkout, and on next checkout use tarball instead of complete checkout, if cache tarball exists. Then could do more tricks, like, if fetching a repo from usptream fails, download mirror tarball from f-droid.org instead.

    #1393

    pfalcon
    Member

    *”for development” – we don’t do development – for any kind of change other than a tiny patch to make a build work, surely the original repository is the place for that, or failing that, fork it (away from f-droid)

    That’s too bad. And well, making a patch is also a development. The fact that there’re only tiny patches so far is because there’s no good infrastructure to produce more elaborated patches – like, if you want to rebuild a package from scratch to test a new version of patch, that used to require networked re-fetch, which is too long.

    And there’s clearly a need to patch more, not less, if we want to add all the software you excluded based on “spying on users” or “uses non-free service” (and I’d like to add them ;-) ).

    And there’s no other place to do it but F-Droid – you know, you cannot fork 100 projects, that’s not scalable or maintainable. Having 100 patches in F-Droid is much more maintainable (yes, it’s also a bit of work, but nothing’s easy).

    #1394

    pfalcon
    Member

    Well, looked into forward-porting my previous patch onto current master, and this commit rather complicates it: https://gitorious.org/f-droid/fdroidserver/commit/d00595e872b7ffe78967b18549641860e4ec642b . How it is a simplification, if instead of very understandable and well-defined operations clone(), reset(), and pull(), a single monstrous gotorevision() method was introduced? Sigh. That’s common problem with fdroid source code – too long and heavy functions are used everywhere, and it takes a while (a lot of prints!) to understand what they do, and then to understand any changes made to them.

    My previous implementation added source tarball caching to single refreshlocal() method in vcs superclass. And now with gotorevision() it’s almost impossible to do it cleanly – same code needs to be applied to each and every subclass into the already heavy method.

    #1408

    CiaranG
    Member

    It’s a simplification in two ways:
    1. That vcs class is not supposed to be a generic version control system abstraction – it only does exactly what fdroid needs, which is always to simply go to a specific revision. So where previously there was an interface with lots of different methods that *needed* to be called in exactly the right order, there is now one that does what’s necessary. So the interface is simpler.
    2. Internally, particularly with the addition of git-svn, all those vcs back-ends behave very differently. Trying to pretend they were the same made the code internally more complicated. It’s much simpler when they have a specific single task to do, and can do it in a way appropriate to the target vcs.

    I know what you mean about long/heavy methods, but these are not – they all make at most 4 subprocess calls, and that’s it. They are all (as well as being more simple) much smaller than they used to be.

    Regarding the patching, I’ll start a separate topic about that, because I’ve been meaning to write about it for a while.

    #1423

    pfalcon
    Member

    1. That vcs class is not supposed to be a generic version control system abstraction – it only does exactly what fdroid needs, which is always to simply go to a specific revision. So where previously there was an interface with lots of different methods that *needed* to be called in exactly the right order, there is now one that does what’s necessary. So the interface is simpler.

    Well, that sounds like an argument that machine code is simpler than high-level language. Of course it is – from CPU’s point of view. But it’s hard mess for a human. We should just accept that programs are means for humans to communicate, computers here is just 3rd party – we grew them smart enough to “understand” programs and run them, but that’s it. So, criteria for simpler should also include “easier to modify, easier to extend, allowing to do much more things, less time to understand”. And of interfaces, simpler that one which captures domain model more in more direct manner, without additional mappings and assumption. So, if we know that clone, checkout-revision, update, reset are elementary operation on repository, than it is better interface than one which tries to pretend that clone-if-not-reset-update-if-needed-reset is the atomic operation of VCS.

    Internally, particularly with the addition of git-svn, all those vcs back-ends behave very differently.

    They still should have those 4 elementary operations, their presence is more or less definition of what VCS is.

    Trying to pretend they were the same made the code internally more complicated.

    You probably mean that it would require few more function calls and maybe data reformatting. Well, maybe it is the case, still not exactly what “complicated” is. Gang of Four said they they drew expiration for their Design Patterns in architecture, so using that metaphor, it for sure takes more materials and focused effort to build a palace than slums, but when you then try to fix, reproduce, upgrade, it comes out what is actually “complex”.

    It’s much simpler when they have a specific single task to do, and can do it in a way appropriate to the target vcs.

    But it turns out that the task became more complicated that it was and really should have been. So, now I need to deal with those tasks individually, and patch them 4 with similar changes again and again, whereas previously I needed to change one high-level function and source archiving automagically worked for any SCM.

    I know what you mean about long/heavy methods, but these are not

    Yes, gotorevision() is not too long, the point is that it’s longer then necessary. Older refreshlocal() was just perfect.

    But there’re lot of too-long functions/methods. Not even functions, single statements! For example, build.py has “for” with 250 lines, whoa! (And that’s after refactoring big deal of code to prepare_source(), thanks for that.) Try to understand that == spend lots of time. Try to patch that == make it even more horrible (hello vagrant support), keep it locally for a while == get horrible conflicts on next update. Some would say it’s Python’s trouble, but over long time I learned that it’s actually its boon – it teaches very vividly how it is important to use functional abstraction, instead of sheets and sheets of flat code.

    Regarding the patching, I’ll start a separate topic about that, because I’ve been meaning to write about it for a while.

    Thanks, looking forward to it.

    • This reply was modified 1 year, 3 months ago by Anonymous’s avatar pfalcon. Reason: typos
    #1427

    hansemil
    Member

    Why not refactor the current gotorevision() to use your re-introduced versions of clone(), reset(), pull(), etc?

    For me it makes sense to have a single method if that is all that is needed right now. If your feature needs a more fine grained api, just extend it. The bigger functions (like gotorevision()) should be easily implemented using the more fine grained functions, given that it is possible to keep a logical abstraction of all different VCS:s on that level. (Probably easier said that done.)

    #1428

    hansemil
    Member

    Also: I agree with CiaranG that we should publish tarballs that exactly matches the built output. One shouldn’t have to run loads of scripts and patches to be able to get the source used for building.

    Also^2:
    There is a good reason that for instance Debian have more complex patch management. (One distribution among many, with specific requirements on how an application should behave when it comes to installation/scripts/etc.)
    F-Droid on the other hand does not require anything special, other than being free, open source and non-malicious. This decision should lie with the app developers. If there is anything “wrong” with an otherwise free and open source app, just submit a patch, fork it or convince the app developer(s) to change it. I don’t think it should lie with F-Droid to “fix” apps.

    #1429

    pfalcon
    Member

    Why not refactor the current gotorevision() to use your re-introduced versions of clone(), reset(), pull(), etc?

    Well, the problem here is obvious: it means that I would revert latest CiaranG changes to the previous state. And that’s of course not the most productive approach, because later someone else can flip back my code, and we’ll be just flip-flopping the code back and forth, instead of having progress. So first I’d like to discuss the approach and make sure that we on the same line regarding featureset and architecture.

    #1430

    pfalcon
    Member

    Also: I agree with CiaranG that we should publish tarballs that exactly matches the built output. One shouldn’t have to run loads of scripts and patches to be able to get the source used for building.

    Ok, let’s talk it over. Suppose, you have exact source used for building. What it is useful for? Well, you can rebuild it, and get exactly same binary as in the repository. But you already had that. You can’t much more than that, in particular it is bad code to further develop – you really should first understand what was upstream code, what changes were applied by F-Droid, etc.

    Suppose, you have exact source used for building, again. So what’s next? How do you build it? It should be obvious, wouldn’t it be? No, it is not – you search and somehow find some more or less good instructions from someone Google, you download one package, then another, unpack/install them, run some stupid GUI tool, they also download something, you try to build, it fails, you come to stupid GUI tool, dig for some obscure checkboxes, etc., etc. Summary: it’s not possible to “easily rebuild” “exact source” anyway. Unless of course you already had experience with all that stuff. But then trust me, if you figured that, you will also figure out F-Droid build system. And you will be able to hack on it. And that’s exactly in best interest of F-Droid.

    #1431

    pfalcon
    Member

    There is a good reason that for instance Debian have more complex patch management. (One distribution among many, with specific requirements on how an application should behave when it comes to installation/scripts/etc.)
    F-Droid on the other hand does not require anything special

    But of course! We’re special! Let’s throw away decades on Linux distro building experience, how they could give us a clue?

    Trust me, many-many years all distros had the same conversion that we have now. We know the outcome – all decent distros ship their source in the structured manner, separating upstream code, distro patches, metadata, etc.

    If there is anything “wrong” with an otherwise free and open source app, just submit a patch, fork it or convince the app developer(s) to change it.

    Good plan. So, please submit patches, fork projects, and convince developers ;-) . I’ve been doing that for many years, and still have my aim unfulfilled – to have a decent Linux distro for any handheld device. So, now I’m going to put my itch first – patch and have decently working software now, wait for upstream to include or ignore those patches in the background (*1). So, I’m very interested in having official F-Droid patching/in-distro forking policy settled.

    *1 I’m not saying that I’m going to succeed or have big progress in that, just saying that’s the direction I’m interested in working on: massive distro-level patching (essentially, forking hundreds of projects, except that it would be mini-forking).

    #1432

    pfalcon
    Member

    Ok, so let’s count again pro’s for introducing “pristine upstream code” tarballs, or rather, what are cons of patching/reverting locally stored repo tree again and again.

    First is that it will finally allow to achieve 100% reproducible builds, regardless of whether any upstreams are down (of course, if such snapshots tarballs w)

    2nd cons was more of intuitive, I didn’t even mention it, to not add subjectivity. Ok, now I got evidence on my hands and can share. So, the second cons is that intuitively, doing incremental patchings and reverts over and over again is not a reliable technique, it’s typical anti-pattern of accumulating incremental error. The repo state just bound to diverge from pristine source sooner or later. It may be due to different reasons – for example, those highly obscure switches of “git clean -dffx” may have bugs (different in different version of git), or any number of other scenarios.

    Here’s one I got today:

    I hacked on http://code.google.com/p/android-shuffle/ , and it has project.properties in .hgignore . But if build recipe has target=, build.py causes that file to be generated (by calling “android” tool). However, on the the next build attempt, build.py doesn’t clean up that file – it uses following command for that:

    hg status -u | xargs rm -rf

    but “hg status -u” doesn’t see files in .hgignore at all. So, if now you make a change to recipe to stop generating project.properties, it will still stay in your local => your local repo diverged from both upstream and state in your recipe.

    And well, in the current build.py code, there may be many more such corner cases. So, it’s a common sense saying that it’s better to not spoil than to clean. And the only reliable way to have clean code is to preserve it immediately after the checkout/update, and when starting a new build, throw away any residue left from previous build, and start from that preserved code.

    I.e. the algorithm for a build should be:

    1. Remove old build directory, if exists.
    2. If upstream mirror tarball exists, uncompress it, and pull into it
    3. Otherwise, clone
    4. If anything new was fetched, recompress tarball
    5. Proceed with patching and compilation

    That’s what *simpler*.

    #1443

    hansemil
    Member
    Why not refactor the current gotorevision() to use your re-introduced versions of clone(), reset(), pull(), etc?

    Well, the problem here is obvious: it means that I would revert latest CiaranG changes to the previous state. And that’s of course not the most productive approach, because later someone else can flip back my code, and we’ll be just flip-flopping the code back and forth, instead of having progress. So first I’d like to discuss the approach and make sure that we on the same line regarding featureset and architecture.

    Ok, maybe I didn’t make myself clear enough.

    What I was trying to suggest was:
    1.) Re-introduce the low-level functions you need (clone(), reset(), pull() etc)
    2.) Refactor the inner workings of gotorevision() to use the methods introduced in 1.)
    3.) Profit

    #1444

    hansemil
    Member
    There is a good reason that for instance Debian have more complex patch management. (One distribution among many, with specific requirements on how an application should behave when it comes to installation/scripts/etc.)
    F-Droid on the other hand does not require anything special

    But of course! We’re special! Let’s throw away decades on Linux distro building experience, how they could give us a clue?

    :-)

    Not special, just different. I think the main point is that if you think that we will have to patch the majority (or at least a significant share) of the apps in the repository or not. I hope not.

    And I don’t look at F-Droid as a distro. Replicant or CyanogenMod may be distros, and impose certain ways of doing stuff to apps that they include in their “distro”. Hopefully we can avoid all that by staying “mint”. That’s what I hope any way…

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic.