On Wed, 15 Sep 2010 02:50:19 +0000, ClassCastException wrote:
> So I go to sort a directory of roughly 4000 photos by date taken. The
> column headings change appropriately right away, showing a down arrow in
> the date taken column, but the visible files remain sorted by name and
> instead of them re-sorting immediately a progress bar starts crawling at
> the top.
>
> Here's the buggy behavior: when the progress bar got past about 80% it
> began to slow down rapidly. By the time it was into the box with the red
> X it was barely moving; less than 1/20 the speed it had had for most of
> the time.
>
> The sort completed eventually, but it took a lot longer than it should
> have.
>
> What happened? Getting all the date takens out of the exif blocks in the
> files should have been somewhat slow, but not that slow and linear in
> the number of files. Actually sorting on that data should have been n
> log n but also lightning fast compared to getting the data (sorting 4000
> pointers by a few integers a few indirections away from the pointers
> should take milliseconds on modern hardware with a decent C or C++
> quicksort implementation).
>
> The only thing I can see causing it to make rapid progress on the task
> at first but then slow down is if it used an algorithm that was
> quadratic and back-loaded most of the work.
>
> The obvious inference is that the idiots at Microsoft used bubble sort
> and specifically implemented it to swap 0, 1 if needed; then 0,1 and
> then 1,2 if needed; then 0,1 and 1,2 and 2,3 if needed; etc.; and the
> progress meter ticks for each iteration of the outermost loop. Then the
> speed of the meter would drop inverse-linearly over time. That doesn't
> *quite* match what I observed, where the speed seemed to drop fastest at
> the end instead of at the start, but it would cause both a very slow
> sort and a slowing progress meter, so it beats any other hypothesis that
> seems reasonable at this point...
I just had a really horrible thought. Maybe they didn't preload all the
date-takens. Maybe it goes to the disk EVERY TIME THE SORT DOES A
COMPARE. So the sort is slowed even more, and worse, the later iterations
go over nearly the whole directory, killing disk cache performance and
causing the particularly precipitous speed drop near the end (when the
total size of the jpeg file headers read in one iteration crosses over
the cache size).
ARGH! A first-year comp sci graduate could easily have done better.
|