Makes SSTimer actually recover properly when it needs to. This is a follow-up for #60846 (3da51f515d) with code I added in my port of that PR to bee.
There were 3 main problems, and each was uncovered after fixing the previous:
/datum/controller/master/New() was using faulty logic to find existing subsystems. It was adding Sound Loops twice and not adding Timer at all (Sound Loops being a subtype of Timer).
/datum/timedevent stores a ref to the timer subsystem in var/datum/controller/subsystem/timer/timer_subsystem for performance. It wasn't being updated to the new Timer subsystem, ultimately resulting in it runtiming as an invalid timer.
The buckets need to be reset during recovery. The TTR and other bucket-related handling is out of whack because SSTimer wasn't firing for however long recovery took. Luckily reset_buckets() can already handle all of this.
Fixes#56292
Why It's Good For The Game
Increases chances of smooth experience
Changelog
cl Semoro
fix: timers not removing from second queue on init
/cl
Previously it was possible for events to enter the short queue when the timer is offset by more than BUCKET_LEN
Now it is forced to schedule events into the second queue if the timer is processing slower then world time goes allowing the timer to keep up
This PR provides a better definition of TIMER_MAX to avoid scheduling timed events that are more than one window of buckets away in terms of timeToRun into buckets queue and properly passing them into the second queue.
Ports ss220-space/Paradise#578
Should be merged with/after #64138
Detailed explanation
The timer subsystem mainly uses two concepts, buckets, and second queue
Buckets is a fixed-length list of linked lists, where each "bucket" contains timers scheduled to run on the same tick
The second queue is a simple list containing sorted timers that scheduled too far in future
To process buckets, the timer uses two variables named head_offset and practical_offset
head_offset determines the offset of the first bucket in time
while practical_offset determines offset from bucket list beginning
There are two equations responsible for determining where timed event would end up scheduled
TIMER_MAX and BUCKET_POS
TIMER_MAX determines the maximum value of timeToRun for timed event to schedule into buckets and not the second queue
While BUCKET_POS determines where to put timed event relative to current head_offset
Let's look at BUCKET_POS first
BUCKET_POS(timer) = (((round((timer.timeToRun - SStimer.head_offset) / world.tick_lag)+1) % BUCKET_LEN)||BUCKET_LEN)
Let's imagine we have our tick_lag set to 0.5, due to that we will have BUCKET_LEN = (10 / 0.5) * 60 = 1200
And head_offset of 100, that would make any timed event with timeToRun = 100 + 600N to get bucket_pos of 1
Now let's look at the current implementation of TIMER_MAX
TIMER_MAX = (world.time + TICKS2DS(min(BUCKET_LEN-(SStimer.practical_offset-DS2TICKS(world.time - SStimer.head_offset))-1, BUCKET_LEN-1)))
Let's say our world.time = 100 and practical_offset = 1 for now
So TIMER_MAX = 100 + min(1200 - (1 - (100 - 100)/0.5) - 1, 1200 - 1) * 0.5 = 100 + 1198 * 0.5 = 699
As you might see, in that example we're fine and no events can be scheduled in buckets past boundary
But let's now imagine a situation: some high priority subsystem lagged and caused the timer not to fire for a bit
Now our world.time = 200 and practical_offset = 1 still
So now our TIMER_MAX would be calculated as follow
TIMER_MAX = 200 + min(Q, 1199) * 0.5
Where Q = 1200 - 1 - (1 - (200 - 100) / 0.5) = 1200 - 1 - 1 + (200 - 100) / 0.5 = 1398
Which is bigger then 1199, so we will choose 1199 instead
TIMER_MAX = 200 + 599.5 = 799.5
Let's now schedule repetitive timed event with timeToRun = world.time + 500
It will be scheduled into buckets since, 700 < TIMER_MAX
BUCKET_POS will be ((700 - 100) / 0.5 + 1) % 1200 = 1
Let's run the timer subsystem
During the execution of that timer, we will try to reschedule it for the next fire at timeToRun = world.time + 500
Which would end up adding it in the same bucket we are currently processing, locking subsystem in a loop till suspending
On next tick we will try to continue and will reschedule at timeToRun = world.time + 0.5 + 500
Which would end up in bucket 2, constantly blocking the timer from processing normally
Why It's Good For The Game
Increases chances of smooth experience
Changelog
cl Semoro
fix: Avoid timer scheduling too far events into short queue
/cl
Ports Semoro's fix (ss220-space/Paradise#511) related to potential SStimer bucket corruption which caused infinite loop.
The essence of the fix is that earlier timers with a built linkedlist could get into the second queue, which could cause an incorrect state. It works super stupidly, resets the state to the original correct one
BUT THERE IS STILL A BUG IN THE CODE RELATED TO THE INFINITE LOOP!
For some reason the SStimer on our server started to break recently at the beginning of the round. Found that code for waterfall drip effect was causing the issue. Found that setting frequensy to 0 (and calling reset_bucket sometimes) can be used to reproduce the bug. Tried to fix it with this PR
there is an oustanding bug with airlocks causing SStimer to brake sometimes.
cl
fix: fixed potential bucket corruption in timer reset_buckets
/cl
Bring _HELPERS/_lists.dm to latest standards by:
-Adding proper documentation and fixing existing one
-Giving vars proper names
-Procs now use snake case as per standard (many files that use those procs will be affected)
* The Failsafe can now recover from an deleted MC
Its also more reliable and can handle a situation where its main Loop runtimes and the MC is stuck
* Reset defcon level correctly
Oops left that in from debugging the levels
* Correctly recover SSasset
* Only decrease defcon if MC creation failed
Also add some sort sleep between emergency loops
* Makes the last two emergency actions manual procs
Since they are kinda unstantable its probalby best
if only admins call these manually
Its also more reliable and can handle a situation where its main Loop runtimes and the MC is stuck
You can also now debug Master/New()
While there will most likely never be any situation where the MC is just gone its still good to know that the game can recover from such a situation
For example maybe someone messed up a SDQL query or maybe someone wanted to delete the MC to create a new one hoping the Failsafe would do so for him
* Converts looping sounds from a list of play locations to just the one
* Updates all uses of looping sounds to match the new arg
* Adds an area based sound manager that hooks into looping sounds to drive the actual audio. I'll be using this to redo how weather effects handle sound
* Some structrual stuff to make everything else smoother
Timers now properly return the time left for client based timers
Weather sends global signals when it starts/stops
Looping sounds now use their timerid var for all their sound related timers, not just the main loop
* This is the painful part
Adds an area sound manager component, it handles the logic of moving into new areas potentially creating new
sound loops. We do some extra work to prevent stacking sound loops.
Adds an ash storm listener element that adds a tailored area sound manager to clients on the lavaland z level.
It's removed on logout.
Adds the ash_storm_sounds assoc list, a reference to this is passed into area sound managers, and it's modified
in a manner that doesn't break the reference in ash_storm (This is what I hate)
* Hooks ash storm listener into cliented mobs and possessed objects
* Documents the odd ref stuff, adds an ignore start var to looping sounds, fixes some errors and lint issues
* Applies kyler's review
banging
Co-authored-by: Kylerace <kylerlumpkin1@gmail.com>
* Cleans up some var names, reduces the amount of looping we do in some areas
* Makes the code compile, redoes the movement listener to be more general
* fuck
* We don't need to detach on del if we're just removing signals on detach
* Should? work
* if(direct) memes
Co-authored-by: Kylerace <kylerlumpkin1@gmail.com>
Unobvious problem spot
#define BUCKET_POS(timer) (((round((timer.timeToRun - SStimer.head_offset) / world.tick_lag)+1) % BUCKET_LEN)||BUCKET_LEN)
With tick_lag equal to 0.1, 0.25, 0.5, rounding of division is normal. But at other values it may be shifted either more or less due to the specifics of floating-point storage and processing. Numbers 0.1, 0.25, 0.5 have blank mantissa, unlike others which lead to numbers such as 245.0000004 when divided.
PS: tick_lag is rounded to the first two decimal places.
Fixes
Rewrote the circular doubly linked list to a regular doubly linked list, because it could cause timers to loop infinitely.
Fixed re-creation of a bucket if the timer does not have a callback.
Added debug stat indicator RST to MC SStimer.
Added optional ability to log dump all timers on crash.
Fixed subsystem logic when a bucket position is misplaced due to division and rounding inaccuracy. The system now captures such rounding errors and restores the correct timer position.
References
[RU] SS220 Paradise port process from TGstation and fixes:
ss220-space/Paradise#5ss220-space/Paradise#10ss220-space/Paradise#26ss220-space/Paradise#32ss220-space/Paradise#37
Contributors
@semoro: fixes
@Bizzonium: port
Changelog
cl Semoro and azizonkg
fix: Ported fixes of SStimer subsystem from RU SS220 Paradise
config: Added a new config var: flag/log_timers_on_bucket_reset
/cl
* sstimer no longer delays maintenance tasks if its going over its tick.
This was leading to bugs if certain state operations happened while a task was delayed, furthermore if the timer subsystem was overloaded, the invoked timers list would bloat as it would never get cleared out, which would make all timer invocations take longer as they had to add to an ever growing list.
* Update timer.dm
* Fix error when a bucket has only one timer.
* simply timer loop logic & improve timer debug string
It would try to batch up linked list modifications and every issue we have ever had has been related to this, so now it just directly pulls the head of the linked list off, using bucketEject, rather then detect when it reaches the end of the linked list queue it will now just know because the bucket will be empty.
All bucketCount logic has been moved to bucketEject and bucketJoin(), which should also keep that more proper.
* Update timer.dm
* Update timer.dm
BINARY_INSERT used to only take typepaths like/this. Now, it expects them to be /like/this, to be more consistent with ther est of the code.
Adds documentation to COMPTYPE.
Adds a test for BINARY_INSERT.
A lot of issues happen when the tick overruns due to less important and more expensive subsystems. If timer is not ran or breaks, a lot of stuff breaks.
Makes use of the do while(FALSE) trick to give it its own context and makes it possible for the inserted value to be different from the compared value so #48747 can use it.
Before we only warned if the wait was 1 or higher to solve issues with objects settings timers on themselves while getting qdeleted, a common enough usecase to support (even if it is annoying), but then I remembered we could check for qdestroying directly.
Also fixes it so such `destory()` time timers actually run consistantly. before they would only work if called after the `..()` in `destroy()`
Because sstimer tracks timers internally in the terms of what "byond tick" they are suppose to run at; float wait values are now rounded *up* to the next Byond Tick rather then have byond round the resulting list index *down* at access time.
0 wait timers are now rounded *up* to `world.tick_lag` to avoid incompabilities with editing the current tick's bucket while it was being processed.