Minimize index download size

From F-Droid
Jump to: navigation, search

this is out of date, follow progress on gitlab:


Problem

Updating F-Droid requires download of index, which is getting larger.

One of the best things about F-Droid is how it is constantly getting more apps added to its index. Of course, this means that the size of the index is increasing. At time of writing, it is at about 420kb compressed (https://f-droid.org/repo/index.jar) and growing steadily. Ideas on how to reduce the burden on mobile data connections are discussed below.

Possible Solutions

Remove/archive grossly out of date information

This is now implemented

This concept was proposed by Ciaran in a comment on gitorious:

"There is a much simpler way to speed up the download and processing, which is to simply remove obsolete apks from the main repo, and put them in an ancillary repo which is switched off by default. These make up the bulk of the index size and processing time. What do I mean by “obsolete apks”? Example, you have an app with 50 versions, and apart from the recent two or three, they were all released over 6 months ago and are completely irrelevant to the vast majority of users."

This could probably be done in such a way that people who want to download the large index, and have every past version could, and the majority of people get the small index. One way to do this is to have two index files, the default at http://f-droid.org/repo has the smaller index without 50 old versions of each app. Advanced users can point at http://f-droid.org/archive will get the whole shebang (implemented per Ciaran's suggestions below). To reduce repetition of code on the server, the smaller index could be derived from the larger one each time it is modified.

Further comment from Ciaran: What I actually had in mind was slightly different, the main repo would be smaller, by losing the old versions, and an additional archive repo would contain those old versions. That way the client can come with both repos 'installed', but with the archive one turned off by default. If you want it, you just turn it on. The packages from the two repos would just get auto-merged anyway, so the end result if you turned the archive one on would be exactly what you see now. Apart from this advantage of simplicity and ease of use, you also don't have any repetition of the data, which saves time and space, and also avoids the problem of what is supposed to happen in the repo/full scenario described above when you point at *both* of those repos.

Remove redundant information

This is now implemented

There seems to be a HTML description (<description>) and plain text description (<desc>) in the index (https://f-droid.org/repo/index.xml). If both are really required, perhaps something like Markdown could be adopted. Alternatively, you could just have HTML, and process it to show it in plain text (e.g. strip out tags, and replace <li>" with * or something. Finally, perhaps just a plain text version could work.

To summarise the solutions above:

  • Only plain text description
  • Only HTML description - and process it to plain text where required
  • Markdown or something else which is nice in plain text, but also can be translated to HTML

Comment from Ciaran: the plain text is only there to support very old F-Droid clients, and I'm pretty sure we could just remove it.

Incremental updates

Currently, each update asks for all info, even if that info has already been received in previous updates.

What would be cool is if you could say: "Give me all updates since last time I updated (Feb 2nd)". This would require there to be a new index file each time the server updates the index, and the client would need to be able to identify which index files are available to download. There are problems with this, such as the number of requests that will need to be made to the server (one to ask what index files are available, then one for each incremental index file), and it is only worthwhile if the larger number of requests is better than the one big request we currently have.

A more interesting solution may be to have:

  • The one index to rule them all, but also,
  • One index per month

Then, the first time a user updates, they get the one index to rule them all. Every subsequent update only gets the current months index file (or, if they haven't updated for several months, the index files for each month between the last update and the current update).

The server, when releasing a new app/version, will update the one index to rule them all, and the current months index. Better yet, just the one index to rule them all, but have an XSLT transformation which generates monthly indexes when requested, based on the content of the one index to rule them all. Then the issue of signing the output of XSLT comes into play however.

To summarise:

  • Each server release creates a new index file, and updates a master file which knows what index files are available
  • Each server release still just updates one index to rule them all, but dynamically creates monthly indexes when requested

Comment from Daniel: I think we could implement this, leaving /index.xml for backwards compatibility. Not critically important since we don't have that many apps, and if we remove the duplicate description we can bring down the current size a bit. But if we plan on distributing more than a thousand apps, this should be done.

Index for SDK range and CPU arch

The older devices struggle the most with processing repo updates. Therefore we could make 5 indexes for the key minsdk milestones: 4, 8, 10, 14, and 18. Devices running 1.6 (android-4) would get an index which won't include apks with minsdk > 4; devices running 4.2 would get the full complement, but they are usually faster anyway. Of course we would still need to include some minsdk (and maxsdk) info in the index, but if the minsdk happens to equal one of those milestones it could be left out.

We could do the same for ABIs i.e. offering one index for arm7 CPUs and another for x86. As gradle becomes more prevalent, we can expect more projects to split apks as it makes that easy; with this division we wouldnt need to be anxious about including so many apks and it would take the load of compatibility processing off the device.

With 4 ABIs and 5 SDK ranges, that makes 20 indexes. If we ever get some translation for descriptions it would make sense to apply this type of division too, since we would probably only be talking about a few.

Another benefit of this is that it may show some compatible versions in the index, which wouldn't be the case if we were using one index and the app had raised the mindsdk before the last three versions.

Download more information on-demand

Client already downloads icons in background, but there are still other types of data, which can be excluded from index and fetched on-demand. Application-related URLs, Flattr and Bitcoin are most suitable candidates. See also issue 394