Automatic pagination in the REST API is inconsistent when the result set is changing
When using "limit=N" in calls to the REST API, for example for jobs, each response contains a "next" URL which contains the matches for the same query which would immediately follow the data contained in this response, and which has the same size limit as the current query. This provides simple pagination for clients.
However, there is no actual link between the "next" URL and the current query, so the query is re-run when visiting the "next" URL. That means the result set that we're sampling from may have changed.
will return me a a result with a "next" URL like this:
Suppose the actual set of jobs is something like
|1||Running||In first result|
|2||Running||In first result|
|3||Running||Next URL points here|
Then the initial query returns jobs 1 and 2. Let's suppose job 1 finishes right after that response. Now my state is
|4||Running||Next URL points here|
When I visit the "next" URL I only get job 4, job 3 has been lost. There is no way for the client to detect this has occurred. This makes it very difficult to collect accurate data when using paging.
It seems like there are two obvious ways to fix this:
- Make the returned URL (much) smarter. This is really hard in general, and requires analysis of the query string. A simple offset won't ever be correct, but it might be possible for some sort orders, to return a valid next pointer. For example, if the results are sorted by id, then returning the next id could work.
- Use an opaque cursor instead and persist it for some length of time. Now the URL is meaningless and the paging of the result set is tracked on the server side.