Improved Search

From F-Droid
Jump to: navigation, search

Filter main lists

Currently, F-Droid launches a new SearchResults activity when the user enters a search term. For various reasons, it would be ideal to instead filter the main lists.

Not 100% sure yet, but it would potentially be nice to be able to filter every list individually too. That is, if you are viewing the "Installed" tab and you start searching, it will filter that list.

Relevant links

Issue 323 - Change search to use action bar filter, rather than new Activity

Highlight matching text

Right now, the description or package name of an app may match the search query, but those fields are not shown in the search results. The search results should highlight the matching snippet in the UI. If it is in the app name or the summary, that info is already shown, and it should be highlighted. If it is the description or package ID, then that info should be shown (or a snippet of it) and then matching terms highlighted.

Rank results based on relevance

Currently the search does returns anything where the name, package id, summary or description matches the query string. It then sorts the results alphabetically.

Possible improvements include implementing tf/idf and the levenshtein distance.

Potential ranking algorithms

TF/IDF

TF/IDF is a relatively straightforward algorithm which results in quite good rankings of bodies of text against a query. The general gist is that words which appear in many documents are not given much weight, and words which appear many times in a single document give that document a higher weight.

Initially, the term frequencies and the inverse document frequencies would have to be pre-calculated after an index update (could be done in the background). After that, the TF/IDF for each document can be calculated in the database query which does the search, which is handy.

Due to the nature of how TF/IDF works, it will not be helpful when matching against the package name, only the title, summary and description.

Levenshtein Distance

The [levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) measures how close a search query is to a string. It is measured as the number of characters that need to be added/removed/changed to go from the search query to the resulting string.

Unlike the TF/IDF, the Levenshtein distance will differ for each will likely have to be calculated in Java. Also unlike TF/IDF, it is only useful for ranking, not for actual searching. Thus, once the results of the search query have been returned from the database, then they can be sorted according to the Levenshtein distance to the query. Given the nature of this algorithm, it probably only makes sense to perform it on the app name and nothing else.

Matching package ID

If the query matches the package name (e.g. org.fdroid.fdroid) exactly, then that should be given a perfect score and shown at the top of the list of results.

Relevant issues

Issue 336 – Improve the search ranking