Hello,
we are developing a library to show last arrived messages of all
threads in a folder with more than 300k messages.
As per the RFC 5256, the IMAP thread command has only the option
to specify the grouping criteria (REFERENCES vs ORDEREDSUBJECT).
So we implemented an algorithm which gets the full UID SORT
ordered by date in reverse order then gets all thread relations
and then post process all outputs to obtain the final result.
Once we have both output we loop through the first uids set (which
is ordered) and for every item of it we loop through the other
uids set with all the thread relations. If the uid of the first
set is contained in one of the elements of the thread messages, we
pick it up and push it in the resulting array.
This is a pseudo-code that shows what we are doing:
array_of_latest_messages_in_thread = []
array_of_sorted_uids = [n,n,n,n,...] // UID SORT (REVERSE DATE)
UTF-8 ALL
array_of_thread_uids = [n,[n,n],n,[n,n,n],[n,n],n,n,...] // UID
THREAD (REFERENCES) UTF-8 ALL
foreach(array_of_sorted_uids as s_uid){
foreach(array_of_thread_uids as t_uids){
if(t_uids contains s_uid){ // or a function to loop
recursively t_uids to search if a leaf is equal to s_uid
array_of_latest_messages_in_thread.push(s_uid)
break
}
}
}
We have also made some little optimizations in the above code like
for example skipping from the outer loop the uids already
processed in the inner loop that have not being selected. Anyway,
the loop is very expensive in term of computation and we are
wondering if there is a better approach to this issue directly
using IMAP instead of post processing the output of the two
commands at application level.
Thanks.
-- Alessio Cecchi Postmaster @ http://www.qboxmail.it https://www.linkedin.com/in/alessice