VSYNC in Firefox 50.0.2 under Windows is broken
To replicate these bugs, you must run Firefox 50.0.2 (2016/11/30) or Firefox Nightly - on Windows.
November 21, 2016

  1. vsync in Firefox is broken
Firefox
Firefox incorrectly computes VSYNC times: The internal "vsync" time that Firefox computes (see the Firefox algorithm in next section) is visible in JavaScript via the requestAnimationFrame callback time argument. Visit vsynctester.com, click on the gear icon, check the "Use rAF time arg as frame time" option, and look at the blue line ("frame VSYNC offset"). Now resize the Firefox browser window. The result is that Firefox no longer computes the "vsync" time correctly (the blue vsync line remains 'flat' in a browser computing vsync correctly). Run this same test using Chrome, and Chrome passes the test. This is what I see in Firefox vs Chrome:

Firefox computes "vsync" time incorrectly

Chrome computes "vsync" time correctly

VIDEO: Firefox under load wakes up WAY past vsync
Firefox does NOT actually wake up on VSYNC: Inter-frame times under Firefox superficially look very good. However, how fast does Firefox actually 'wake up' after true vsync (to then call the requestAnimationFrame() callback)? This delay is the 'frame vsync offset', and it should be very fast/small. But under Windows, Firefox is not only 'waking up' several milliseconds past true vsync, but under a load on the system, Firefox is waking up WAY past vsync. For proof, just watch the video to the right (blue line is frame vsync offset). In fact, Chrome, which is running at the exact same time, remains true vsync aligned.
UPDATE: I walked into a Best Buy and replicated this issue on every computer that I tested (8+ varied systems from low end to high end). But note that this issue is only for Firefox under Windows.

  2. The Firefox vsync algorithm
The Firefox vsync algorithm: Mozilla publishes Firefox source code which contains the following very buggy algorithm for synchronizing to VSYNC under Windows:
  Firefox 50.0.2 vsync algorithm (Windows; very buggy)  
      // BUG: mVsyncRate is a rounded (wrong) rate
1709: mVsyncRate = (dwmGetCompositionTimingInfo "rateRefresh")

1779: GetVBlankTime
1782:   now = Now()
xxxx:   vsync = (dwmGetCompositionTimingInfo "qpcVBlank")
1804:   if IsWin10OrLater && vsync >= now
1818:     vsync = vsync-mVsyncRate
1824:   if vsync >= now
1825:     vsync = now        // BUG: 'now' is not vsync time
1831:   if now-vsync > 4ms
1832:     vsync = now        // BUG: 'now' is not vsync time
1835:   return vsync

1838: VBlankLoop
1843:   vsync = Now()
xxxx:   mPrevVsync = (init)
1848:   loop forever
1857:     NotifyVsync(vsync)
1870:     dwmFlush()         // BUG: waking up late causes algorithm failure
1877:     now = Now()        // BUG: 'now' is not vsync time
1890:     vsync = mPrevVsync + mVsyncRate  // BUG: time drift
1891:     if vsync > now
1893:       vsync = GetVBlankTime()
1896:     if vsync <= mPrevVsync
1897:       vsync = Now()    // BUG: 'now' is not vsync time
1900:     if now-vsync > 2ms
1903:       vsync = GetVBlankTime()
1906:     mPrevVsync = vsync
The major changes that made this Firefox vsync algorithm:
2015-02-16: code - Part 2: Create a vsync source on windows - 38.0a1 - bug
2015-02-16: code - Part 3: Create a vsync thread loop with dwmflush - 38.0a1 - bug
2015-08-30: code - Use the previous vsync timestamp on windows 10 - 41.0 - bug
2016-07-27: code - Report a constant FAKED vsync interval diff on windows - 50.0a1 - bug
2016-08-26: code - Correct for negative vsync timestamps on windows - 50.0a2 - bug

View the latest gfxWindowsPlatform.cpp code
  3. Firefox vsync bugs

BUG #1 -- Firefox incorrectly computes VSYNC times - due to a major design flaw: There is a major design flaw in the Firefox code that likely explains most (all?) of the 'time jumps' described by the programmer -- because when Firefox is calculating the incorrect vsync times (see major spikes in graph right), Windows DWM vsync times are completely stable!

Replicate: Run Firefox against vsynctester.com; select the "rAF time arg as frame time" option; turn off all graph lines except for the inter-frame line; and set the graph scale to 10 (milliseconds). Run Chrome as the same time with the same settings. You are now plotting the web browser internal 'vsync' computed time. In Firefox, this line will be perfectly flat with periodic minor spikes (caused by the bugs described below). Now resize the Firefox application window and notice that Firefox is no longer vsync aligned. Mozilla blames Windows.

Independent DWM test tool: To eliminate Windows as the cause, run the dwm-vsync-tests.exe DWM timing testing tool at the same time as the test above. Run the "extended DwmGetCompositionTimingInfo" test from the 'Tests' menu, which runs for 3600 frames, monitoring Windows "qpcVBlank" time and buzzes when it detects any inter-frame variation of more than 0.5 milliseconds. If there is a major spike in the Firefox graph (as seen in graph above right) but you do NOT hear a buzz from the test tool, you have just confirmed a problem in Firefox.
To test that the tool is working, run the test, and press the 'PrintScreen' button on your keyboard, which on most Windows systems, causes a skipped frame -- and a buzz in the tool. Also, the test only runs for one minute (and then beeps to remind you).
Conflated times: The programmer apparently conflated the DwmFlush wake up time as vsync time. DwmFlush wake up time is NOT vsync time. And Firefox code comparing (all over the place) flush wake up times ("now" in Firefox code) with a computed "vsync" time is a meaningless comparison with no impact on the location of true vsync.
To clearly see this in the Firefox vsync algorithm above, add a debug spin-wait of exactly four milliseconds after DwmFlush. This intentional delay has NO impact on actual vsync times. And yet, this delay causes Firefox to compute an incorrect vsync time.
Proof via a short code walk-through: Assume that Firefox is running normally, and vsync is set to truevsync. On the next frame, DwmFlush wakes up normally, Firefox advances vsync by mVsyncRate (line 1890) and everything works (An inter-frame time of +0 milliseconds). But on the next frame, assume DwmFlush wakes up 5ms after truevsync, Firefox advances vsync by mVsyncRate, and the following code in Firefox is triggered (line 1900):

and then in GetVBlankTime(), line 1831 triggers, and vsync is set to now, or truevsync+5ms (An inter-frame time of +5 milliseconds):

On the next frame, DwmFlush wakes up more normally, Firefox advances vsync by mVsyncRate (line 1890), but that time is now ahead of now (prior vsync was set to +5 past truevsync) and causes line 1891 to trigger, which sets vsync back to truevsync (An inter-frame time of -5 milliseconds):
Then, assume the next frame is more normal (An inter-frame time of +0 milliseconds).

The end result is Firefox vsync inter-frame times of +0, +5, -5, +0 -- which IS the spike seen in the graph above -- and all the while, true vsync times from Windows are 100% stable.

Firefox must start over: true vsync times are true vsync times (and obtained from the OS and never computed) and how late Firefox wakes up (or not) after true vsync plays NO ROLE WHATSOEVER in true vsync time values. Throw the code away and start over.

BUG #2 -- Firefox does NOT actually wake up on VSYNC - because it uses DwmFlush() to synchronize to vsync: Looking at Firefox source code, it becomes clear that Firefox is internally using DwmFlush() in a loop in line 1870, as a means of synchronizing to VSYNC under Windows. But that is a huge problem because DwmFlush() wakes up after VSYNC (not on VSYNC), sometimes way after VSYNC.
Proof: Go to vsynctester.com, let the test run for one minute, click on the gear icon, check 'locked', uncheck 'Use rAF time arg as frame time', and notice that the blue vsync offline line is several milliseconds past true vsync. This is what I see:
Then, run notepad on top of Firefox and resize the notepad application. Watch the video at the top of this page. Or, add a load on the system another way.

What is happening is that on a system with NO load, DwmFlush() wakes up 0.5-2 ms past VSYNC. Under mild system load DwmFlush() wakes up 2-4 ms past VSYNC. And under a heavier system load, DwmFlush() wakes up 4+ ms past VSYNC. This behavior makes DwmFlush() useless as a means to synchronize to VSYNC (source).
BUG #3 -- using an inaccurate rounded Hz value from Windows: Firefox in line 1705 is using "rateRefresh", a rounded and very inaccurate rate, from Windows:
Proof: On a computer with a true display Hz around 59.803 Hz (rate 16.722 ms), Firefox uses a rounded 60 Hz and computes (line 1707 above) a mVsyncRate of 1000/60 milliseconds (rate 16.667 ms). That is the wrong rate!
There is a much higher precision "qpcRefreshPeriod" that could have been used. And if for some reason, Mozilla did not want to use qpcRefreshPeriod -- an incredibly accurate rate could have been computed by just monitoring vsync times, over time.
BUG #4 -- computing (FAKING) future vsync times is the cause of time drift: Firefox manually computes a sequence of VSYNC times (line 1890). But without a 100% precise "mVsyncRate", that will cause "vsync" to drift too far (one way or the other) due to accumulated errors. The true vsync time comes from Windows, not from adding a constant to a variable:
Proof: Consider a display with a Hz of 60.107 Hz. The true rate is (about) 16.637 ms, but Firefox will actually use a rate of (about) 16.667 ms, which every frame introduces a time drift into the FUTURE:
vsync = mPrevVsync + mVsyncRate + 0.030;
which appears to be a very inconsequential error -- until you realize that the error is being introduced every single frame -- so it adds up very quickly. After just 34 frames (slightly over 1/2 second), Firefox's "vsync" computation is already one full millisecond into the future ahead of true VSYNC time (an entire frame is just 16.637 ms), which explains why the 'if' check on line 1891 is incorrectly triggering so frequently.

Or, consider a display with a Hz of 59.803 Hz. The true rate is (about) 16.721 ms, but Firefox will actually use a rate of (about) 16.667 ms, which every frame introduces a time drift into the PAST, which again adds up very quickly (see also 'a lack of proper testing' below for a chart).
vsync = mPrevVsync + mVsyncRate - 0.054;
BUG #5 -- A fundamental misunderstanding of how vsync times under Windows work: The Firefox programmer fundamentally does not understand what the "qpcVBlank" vsync time from Windows means -- because the programmer is making numerous comparisons of this time to "now", which is a meaningless and incorrect comparison. The programmer assumed that "qpcVBlank" is *the* vsync time. But in reality, "qpcVBlank" is *a* vsync time sample (past, present, or even future, and possibly even more than one frame away from 'now'), which explains why comparing it to a current 'now' is meaningless. The way to effortlessly deal with this is presented under 'a solution' below.

BUG #6 -- preemption bugs: The Firefox vsync code has preemption bugs. As one example, a section of code that performs a time difference calculation between two different time bases, that will be incorrect if the OS preempts the thread in the middle of the time calculation.
Proof: The OS preempting Firefox between "Now()" and "QueryPerformanceCounter()" in GetVBlankTime will cause "usAdjust" to be computed incorrectly, making it appear that the "vsync" time from the OS went backwards in time (when in reality, it did not)!
BUG #7 -- wrong precision data types: vsynctester.com detects that Firefox (at a precise 60Hz vsync) is actually using a Hz of 59.999996 -- which is entirely due to Firefox code using a low precision float floating point math type instead of a higher precision double type.
Proof: Find a computer with an actual display hertz of 60.0000XX. Then run Firefox against vsynctester.com and turn on the "Use rAF time arg as frame time" option, and notice that the Hz firefox uses is 59.999996.
And more... And there are even more issues, but after this many bugs, what is the point? Just start over.
  • does not (properly) handle DWM skipping frames
  • does not handle DWM collapsing frames
  • does not handle >1 frame DWM differential (yes, that happens)
  • MOZ_ASSERT on a variable, thinking it was signed, but it was unsigned
  • incorrect C "sizeof" usage
  • ...
Incorrectly blaming Windows: The Firefox code is full of comments blaming Windows for all sorts of problems and bugs, but in reality, most (if not all) bugs and problems were caused by incorrect assumptions by the programmer, and bad code.

A lack of proper testing by Mozilla: For nearly a month, Firefox Nightly (from 2016/07/27 to 2016/08/26) would internally compute a "vsync" value that went further and further back into the past -- but only for a display where the true Hz was below 60.000Hz (See Bug #4 above for details). Which points out that Mozilla is not properly testing vsync code changes. The purple line going back in time indefinitely (and then disappearing from the chart) is impossible to overlook, but Mozilla did:
The programmer refuses to belive it: I filed a bug report with Mozilla and directly contacted the programmer, who wrote this code, all to no avail. The programmer STILL (incorrectly) blames Windows. So Firefox oscillates, creating a (wrong) 'staircase' VSYNC timebase, and Firefox injects spikes/oscillations, all because a programmer stubbornly refuses to believe that his code is the problem.

  4. A Solution
A solution: Firefox must:
  1. Wake up on VSYNC: "DwmFlush" does not work to synchronize to vsync (it can wake up far too late). Use D3DKMTWaitForVerticalBlankEvent() instead (details, with code). Worst case, use timers (like Chrome), which achieves much better results than "DwmFlush", especially when a system is under load.

  2. Properly compute the true vsync time: Throw away all the Firefox code that deals with computing and comparing vsync times and replace with this single line of code that implements a very simple rule: For any given wakeup time "now", compute the true vsync time for that wakeup time as follows:
    vsync = now - (((now-vblank+vperiod/8)%vperiod+vperiod)%vperiod-vperiod/8);
    where vblank is any past, present, or future D3DKMTWaitForVerticalBlankEvent() wake up time (but using the latest time is best), and vperiod is computed from monitoring wakeup times.
A lesson for all programmers -- understand the true cause of a bug before fixing the bug: The programmer wrote buggy code, and then when things went south, believing that his code must be bug free, the programmer wrote lots of other code to fix the 'Windows' bug (and then blaming Windows in code comments), when in reality the bug was in his code. What a mess of code.


Firefox tool hides ALL micro-jitter


±50 microsecond micro-jitter visible at vsynctester.com
Epilogue: Someone found excessive jitter in Firefox, reported it to Mozilla as a bug (bug 1278408), and pointed to Chrome (which has real vsync times with microsecond micro-jitter) as an exemplar. And the 'solution' Mozilla came up with was to FAKE THE VSYNC TIMES being sent to the rAF callback, rather than fixing the true cause of the jitter seen in Firefox! And then Firefox messed up faking the vsync times. Stunning.

Also, the tool in the Firefox bug report effectively HIDES all micro-jitter (showing only major jitter) because the tool uses the wrong chart scale. Seen above right is Chrome, with micro-jitter, but the green inter-frame line is flat. Below right is a tool with a different scale that shows the same inter-frame times, but now micro-jitter is visible.

But if Mozilla really wants to eliminate even millisecond micro-jitter from Firefox (a very bad idea), it is very easy (one way). But before they do, Mozilla must show a real world animation that is actually negatively impacted by a rAF callback time argument (vsync times) with ±50 microsecond micro-jitter.