VSYNC in Firefox 50.0.2 under Windows is broken|
To replicate these bugs, you must run Firefox 50.0.2 (2016/11/30) or Firefox Nightly - on Windows.
November 21, 2016
1. vsync in Firefox is broken
Firefox incorrectly computes VSYNC times: The internal
"vsync" time that Firefox computes (see the Firefox algorithm in next section)
the requestAnimationFrame callback time argument. Visit vsynctester.com,
click on the gear icon, check the "Use rAF time arg as frame time" option,
and look at the blue line ("frame VSYNC offset"). Now resize the Firefox browser
window. The result is that Firefox no longer computes the "vsync" time
correctly (the blue vsync line remains 'flat' in a browser computing vsync correctly).
Run this same test using
Chrome, and Chrome passes the test. This is what I see in Firefox vs Chrome:
Firefox computes "vsync" time incorrectly
Chrome computes "vsync" time correctly
Firefox does NOT actually wake up on VSYNC:
Inter-frame times under Firefox superficially look very good. However, how fast does Firefox
actually 'wake up' after true vsync (to then call the requestAnimationFrame()
callback)? This delay is the 'frame vsync offset', and it should be very fast/small.
But under Windows, Firefox is not only 'waking up' several milliseconds past true vsync, but under
a load on the system, Firefox is waking up WAY past vsync. For proof, just
watch the video to the right (blue line is frame vsync offset). In fact, Chrome, which is running
at the exact same time, remains true vsync aligned.
VIDEO: Firefox under load wakes up WAY past vsync
UPDATE: I walked into a Best Buy and replicated this issue
on every computer that I tested (8+ varied systems from low end to high end).
But note that this issue is only for Firefox under Windows.
CAUSE: Firefox uses DwmFlush() to synchronize to 'vsync' on Windows -- but DwmFlush()
returns milliseconds AFTER vsync (after the OS composites), not on vsync. This is
very easy to see:
2. The Firefox vsync algorithm
The Firefox vsync algorithm: Mozilla publishes Firefox source code
which contains the following very buggy algorithm for synchronizing to VSYNC
The major changes that made this Firefox vsync algorithm:
| Firefox 50.0.2 vsync algorithm (Windows; very buggy)
// BUG: mVsyncRate is a rounded (wrong) rate
1709: mVsyncRate = (dwmGetCompositionTimingInfo "rateRefresh")
1782: now = Now()
xxxx: vsync = (dwmGetCompositionTimingInfo "qpcVBlank")
1804: if IsWin10OrLater && vsync >= now
1818: vsync = vsync-mVsyncRate
1824: if vsync >= now
1825: vsync = now // BUG: 'now' is not vsync time
1831: if now-vsync > 4ms
1832: vsync = now // BUG: 'now' is not vsync time
1835: return vsync
1843: vsync = Now()
xxxx: mPrevVsync = (init)
1848: loop forever
1870: dwmFlush() // BUG: waking up late causes algorithm failure
1877: now = Now() // BUG: 'now' is not vsync time
1890: vsync = mPrevVsync + mVsyncRate // BUG: time drift
1891: if vsync > now
1893: vsync = GetVBlankTime()
1896: if vsync <= mPrevVsync
1897: vsync = Now() // BUG: 'now' is not vsync time
1900: if now-vsync > 2ms
1903: vsync = GetVBlankTime()
1906: mPrevVsync = vsync
2015-02-16: code - Part 2: Create a vsync source on windows - 38.0a1 - bug
2015-02-16: code - Part 3: Create a vsync thread loop with dwmflush - 38.0a1 - bug
2015-08-30: code - Use the previous vsync timestamp on windows 10 - 41.0 - bug
2016-07-27: code - Report a constant FAKED vsync interval diff on windows - 50.0a1 - bug
2016-08-26: code - Correct for negative vsync timestamps on windows - 50.0a2 - bug
View the latest gfxWindowsPlatform.cpp code
BUG #1 -- Firefox incorrectly computes VSYNC times - due to a major design flaw: There is a major design flaw
in the Firefox code that likely explains most (all?) of the 'time jumps' described
by the programmer -- because when Firefox is calculating the incorrect vsync times (see major
spikes in graph right),
Windows DWM vsync times are completely stable!
Replicate: Run Firefox against vsynctester.com; select the
"rAF time arg as frame time" option; turn off all graph lines except
for the inter-frame line; and set the graph scale to 10 (milliseconds). Run Chrome
as the same time with the same settings. You are now plotting the web browser internal 'vsync'
computed time. In Firefox, this line will be perfectly flat with
periodic minor spikes (caused by the bugs described below).
Now resize the Firefox application window and notice that Firefox is no
longer vsync aligned. Mozilla blames Windows.
Independent DWM test tool:
To eliminate Windows as the cause, run the
DWM timing testing tool at the same time as the test above.
Run the "extended DwmGetCompositionTimingInfo" test from the 'Tests'
menu, which runs for 3600 frames, monitoring Windows "qpcVBlank" time
and buzzes when it detects any inter-frame variation of more than 0.5 milliseconds.
If there is a major spike in the Firefox graph (as seen in graph above right)
but you do NOT hear a buzz from the test tool, you have just confirmed
a problem in Firefox.
To test that the tool is working, run the test, and press the 'PrintScreen' button
on your keyboard, which on most Windows systems, causes a skipped frame -- and a buzz
in the tool. Also, the test only runs for one minute (and then beeps to remind you).
Conflated times: The programmer apparently conflated the
DwmFlush wake up time as vsync time. DwmFlush wake up time is NOT vsync time.
And Firefox code comparing (all over the place) flush
wake up times ("now" in Firefox code) with a computed "vsync" time is a
meaningless comparison with no impact on the location of true vsync.
To clearly see this in the Firefox vsync algorithm above, add a debug spin-wait of exactly
four milliseconds after DwmFlush. This intentional delay has NO impact on
actual vsync times. And yet, this delay causes Firefox to compute an incorrect vsync time.
Proof via a short code walk-through: Assume that Firefox is running normally,
and vsync is set to truevsync. On the next frame, DwmFlush
wakes up normally, Firefox advances vsync by mVsyncRate (line 1890)
and everything works (An inter-frame time of +0 milliseconds).
But on the next frame,
assume DwmFlush wakes up 5ms after truevsync,
Firefox advances vsync by mVsyncRate, and the following code
in Firefox is triggered (line 1900):
and then in GetVBlankTime(), line 1831 triggers, and
vsync is set to now, or truevsync+5ms
(An inter-frame time of +5 milliseconds):
On the next frame, DwmFlush wakes up more normally, Firefox advances vsync by
mVsyncRate (line 1890), but that time is now ahead of now
(prior vsync was set to +5 past truevsync) and causes line 1891 to trigger,
which sets vsync back to truevsync (An inter-frame time of -5 milliseconds):
Then, assume the next frame is more normal
(An inter-frame time of +0 milliseconds).
The end result is Firefox vsync inter-frame times of +0, +5, -5, +0 -- which IS
the spike seen in the graph above -- and all the while, true vsync times from
Windows are 100% stable.
Firefox must start over: true vsync times
are true vsync times (and obtained from the OS and never computed) and how late Firefox
wakes up (or not) after true vsync plays NO ROLE WHATSOEVER in true vsync time values.
Throw the code away and start over.
BUG #2 -- Firefox does NOT actually wake up on VSYNC - because it uses DwmFlush() to synchronize to vsync:
Firefox source code, it becomes clear that Firefox is internally using
in a loop in line 1870,
as a means of synchronizing to VSYNC under Windows. But that is a huge problem because
DwmFlush() wakes up after VSYNC (not on VSYNC),
sometimes way after VSYNC.
Proof: Go to vsynctester.com, let the test run for one minute, click
on the gear icon, check 'locked', uncheck 'Use rAF time arg as frame time',
and notice that the blue vsync offline line is several milliseconds past
true vsync. This is what I see:
BUG #3 -- using an inaccurate rounded Hz value from Windows:
Firefox in line 1705 is using "rateRefresh", a rounded
and very inaccurate rate, from Windows:
Then, run notepad on top of Firefox and resize the notepad application.
Watch the video at the top of this page. Or, add a load
on the system another way.
What is happening is that on a system with NO load, DwmFlush() wakes
up 0.5-2 ms past VSYNC. Under mild system load DwmFlush() wakes
up 2-4 ms past VSYNC. And under a heavier system load, DwmFlush()
wakes up 4+ ms past VSYNC. This behavior makes DwmFlush() useless
as a means to synchronize to VSYNC
Proof: On a computer with a true display Hz around 59.803 Hz (rate 16.722 ms),
Firefox uses a rounded 60 Hz and computes (line 1707 above) a mVsyncRate of
1000/60 milliseconds (rate 16.667 ms). That is the wrong rate!
There is a much higher precision "qpcRefreshPeriod" that could have been used.
And if for some reason, Mozilla did not want to use qpcRefreshPeriod --
an incredibly accurate rate could have been computed by just monitoring
vsync times, over time.
BUG #4 -- computing (FAKING) future vsync times is the cause of time drift:
Firefox manually computes a sequence of
VSYNC times (line 1890). But without a 100% precise "mVsyncRate",
that will cause "vsync" to drift too far (one way or the other) due to accumulated
errors. The true vsync time comes from Windows, not from adding a constant to a variable:
Proof: Consider a display with a Hz of 60.107 Hz. The true rate is (about) 16.637 ms, but
Firefox will actually use a rate of (about) 16.667 ms, which every frame introduces
a time drift into the FUTURE:
which appears to be a very inconsequential error -- until you realize that the error is
being introduced every single frame -- so it adds up very quickly. After
just 34 frames (slightly over 1/2 second), Firefox's "vsync" computation
is already one full millisecond into the future
ahead of true VSYNC time (an entire frame is just 16.637 ms), which explains why
the 'if' check on line 1891 is incorrectly triggering so frequently.
vsync = mPrevVsync + mVsyncRate + 0.030;
Or, consider a display with a Hz of 59.803 Hz. The true rate is (about) 16.721 ms, but
Firefox will actually use a rate of (about) 16.667 ms, which every frame introduces
a time drift into the PAST, which again adds up very quickly (see also
'a lack of proper testing' below for a chart).
BUG #5 -- A fundamental misunderstanding of how vsync times under Windows work:
The Firefox programmer fundamentally does not understand what the "qpcVBlank"
vsync time from Windows means -- because the programmer is making numerous
comparisons of this time to "now", which is a meaningless and incorrect
comparison. The programmer assumed that "qpcVBlank" is *the* vsync time.
But in reality, "qpcVBlank" is *a* vsync time sample (past, present, or
even future, and possibly even more than one frame away from 'now'), which explains
why comparing it to a current 'now' is meaningless.
The way to effortlessly deal with this is presented under 'a solution' below.
vsync = mPrevVsync + mVsyncRate - 0.054;
BUG #6 -- preemption bugs: The Firefox vsync code has preemption bugs.
As one example, a section of code that performs a time difference calculation between
two different time bases, that will be incorrect if the OS preempts the thread in
the middle of the time calculation.
Proof: The OS preempting Firefox between "Now()" and "QueryPerformanceCounter()" in
GetVBlankTime will cause "usAdjust" to be computed incorrectly, making it appear
that the "vsync" time from the OS went backwards in time (when in reality, it did not)!
BUG #7 -- wrong precision data types:
vsynctester.com detects that Firefox (at a precise 60Hz vsync)
is actually using a Hz of 59.999996 -- which is entirely due to Firefox
code using a low precision
floating point math type instead of a higher precision double type.
Proof: Find a computer with an actual display hertz of 60.0000XX. Then run Firefox
against vsynctester.com and turn on the "Use rAF time arg as frame time"
option, and notice that the Hz firefox uses is 59.999996.
And more... And there are even more issues, but after this many bugs, what
is the point? Just start over.
Incorrectly blaming Windows:
The Firefox code is full of comments blaming Windows for all sorts of problems and bugs,
but in reality, most (if not all) bugs and problems were caused by incorrect assumptions
by the programmer, and bad code.
- does not (properly) handle DWM skipping frames
- does not handle DWM collapsing frames
- does not handle >1 frame DWM differential (yes, that happens)
- MOZ_ASSERT on a variable, thinking it was signed, but it was unsigned
- incorrect C "sizeof" usage
A lack of proper testing by Mozilla: For nearly a month, Firefox Nightly
(from 2016/07/27 to 2016/08/26) would internally compute a "vsync" value
that went further and further back into the past --
but only for a display where the true Hz was below 60.000Hz (See Bug #4 above
for details). Which
points out that Mozilla is not properly testing vsync code changes.
The purple line going back in time indefinitely (and then disappearing from the chart)
is impossible to overlook, but Mozilla did:
The programmer refuses to belive it: I filed a bug report with Mozilla and directly
contacted the programmer,
who wrote this code, all to no avail. The programmer STILL (incorrectly)
blames Windows. So Firefox oscillates, creating a (wrong) 'staircase' VSYNC
timebase, and Firefox injects spikes/oscillations, all because a programmer
stubbornly refuses to believe that his code is the problem.
A solution: Firefox must:
A lesson for all programmers -- understand the true cause of a bug before fixing the bug:
The programmer wrote buggy code, and
then when things went south, believing that his code must be bug free, the
programmer wrote lots of other code to fix the 'Windows' bug (and then blaming
Windows in code comments), when in reality the bug was in
his code. What a mess of code.
- Wake up on VSYNC: "DwmFlush" does not work to synchronize to vsync (it can wake up far too late).
Use D3DKMTWaitForVerticalBlankEvent() instead
(details, with code).
Worst case, use timers (like Chrome), which achieves much better results than "DwmFlush",
especially when a system is under load.
Properly compute the true vsync time:
Throw away all the Firefox code that deals with computing and comparing vsync times
and replace with this single line of code that implements a very simple rule:
For any given D3DKMTWaitForVerticalBlankEvent() wakeup time "now", compute the vsync time for that wakeup
time as follows:
where both vblank and vperiod are derived and computed
by analyzing all prior vsync wake up times.
Here is one technique and
here is another.
vsync = now - (((now-vblank+vperiod/8)%vperiod+vperiod)%vperiod-vperiod/8);
Someone found excessive jitter in Firefox, reported it to Mozilla as a bug
and pointed to Chrome (which has real vsync times with microsecond micro-jitter)
as an exemplar. And the
Mozilla came up with was to FAKE THE VSYNC TIMES
being sent to the rAF callback, rather than fixing the true cause of
the jitter seen in Firefox! And then Firefox messed up faking
the vsync times. Stunning.
Firefox tool hides ALL micro-jitter
±50 microsecond micro-jitter visible at vsynctester.com
Also, the tool in the Firefox bug report effectively HIDES all
micro-jitter (showing only major jitter) because the tool
uses the wrong chart scale. Seen above right is Chrome, with
micro-jitter, but the green inter-frame line is flat.
Below right is a tool with a different scale that shows the
same inter-frame times, but now micro-jitter is visible.
But if Mozilla really wants to eliminate even millisecond micro-jitter
from Firefox (a very bad idea), it is very easy (one way).
But before they do, Mozilla must show a real world animation that is actually negatively
impacted by a rAF callback time argument (vsync times) with
±50 microsecond micro-jitter.