NodeCanvas Forums › General Discussion › BT Performance issues with 100-200 agents
Tagged: performance behaviour tree
We’ve been working with NodeCanvas for about 6 months now and we’ve recently been working hard to improve the runtime performance of our game. After a lot of work, the one major issue that remains is the CPU cost of our enemies.
Our game very often has plenty of active enemies moving around, attacking etc. It is common to have at least 50 enemies being live, with more heavy areas being able to go up to 150-200 enemies at a time. As such, its not surprising that enemies are the most expensive thing in our game. However, it appears NodeCanvas is currently taking a significant amount of performance which we would not expect.
The performance issues we have are not related to instantiation/activation, but the Tick() of the behaviour tree itself. (note that we tick manually from code each Update() )
In the attachments below you can see the following information:
1. A (not deep) profile sample of a build, showing BTEnemy.Update() which basically only does the Tick()
2. Another (not deep) profile sample, but this time with custom Profiler.BeginSample()’s added to most nodes we’ve created (excl. those that only do an enum/bool check and then exit), showing that performance is not in our code.
3. A screenshot showing our behaviour tree, I’ve marked the paths it takes (red crosses are ALWAYS exited in the scenario I used to take the profile samples)
4. A screenshot of 150 enemies surrounding the player (at which point I profiled)
We’ve been profiling a lot both on higher-end machines and lower-end machines, and the relative performance is similar on both. Using a release build also does not significantly change the runtime performance of our game. The profile samples I took are on a high end machine, but on lower end’s I’ve seen the BTEnemy go between 4-5ms, we’re also targetting 60 FPS on those platforms, so that is significant.
There was another thread which also discussed performance, but it seems discussion has stalled there and I didn’t want to hog the thread: NodeCanvas | Visual Behaviour Trees and State Machines for Unity (paradoxnotion.com)
I’m hoping you’re able to help us out!
Hello,
I’m just a user of this forum, but this is the 2nd post I know to complain about BT’s performance. This leaves me feeling a bit confused as to whether to use BT of the canvas node or not if the problem is not found.
So I also want to contribute ideas to figure out this problem.
I think you could try opening up BT’s source code and adding ProfilerMarkers deep into the functions called in BT’s Update.
Hello there and thank you for the information.
I want to please clarify that the most important performance hit of a BT is what it actually does with its Actions and Conditions, rather than the BT Tick (tree traversal) on its own. Of course Dynamic Selector used very often can also increase that hit since a Dynamic Selector re-evaluates all nodes (left-to-current) and if those nodes (the conditions attached to them) are performance heavy so will the overall tree be.
Can you please post a screenshot of a deep profile so that we can better see what is taking up the 2.2 ms as well as the 110.7KB allocated?
If there is anything we see on the BT traversal side of things, I will be more than glad to optimize it if it is not already.
Thank you!
Join us on Discord: https://discord.gg/97q2Rjh
Hi, thanks for the responses, time to go “deeper”!
First things first, regarding GC-Alloc. This was mostly caused by inefficient use of retrieving the name that for Profiler.BeginSample(). You can see this in the original screenshots as well, with the no-debug screenshot only having 1.1KB as opposed to 100+KB. To take away doubts, I’ve cleaned that up. The 1.1KB that remains comes from 2x 0.6KB IGraphAssignableExtensions.TryStartSubGraph() as you can see in the first attachment.
It is a bit difficult to get good information when deep profiling a graph-based tool like NodeCanvas, because the profiler does a lot of nesting, but also because deep profiling adds a lot of overhead to some operations but not so much others (IIRC).
Resetting nodes is an example of this, I’ve added 2 attachments regarding them. They occur a total of 9 times in varying costs, none of it seems related to what I’ve done inside OnReset() callbacks or anything, but rather the resetting itself. (2 screenshots not made in same frame)
Finally, another screenshot showing a bit more of the custom nodes that I’ve made actually doing stuff, so yes there is definitely some cost to operations I am doing, but just like in the original post the cost only amounts to a small portion of the total.
I’m afraid adding profiler markers will only further obfuscate the data, unless there is a better way of doing this?
Hello there and thank you for the information. I want to please clarify that the most important performance hit of a BT is what it actually does with its Actions and Conditions, rather than the BT Tick (tree traversal) on its own. Of course Dynamic Selector used very often can also increase that hit since a Dynamic Selector re-evaluates all nodes (left-to-current) and if those nodes (the conditions attached to them) are performance heavy so will the overall tree be. Can you please post a screenshot of a deep profile so that we can better see what is taking up the 2.2 ms as well as the 110.7KB allocated? If there is anything we see on the BT traversal side of things, I will be more than glad to optimize it if it is not already. Thank you!
Bump, hoping you can help me out @Gavalakis !
Hello again and sorry for the late reply.
Thank you for the extra information. I can confirm that IGraphAssignable TryStartSubGraph and TryStopSubGraph could be optimized. I will need some time to investigate what optimizations I could add to those.
Thank you!
Join us on Discord: https://discord.gg/97q2Rjh
Hello again and sorry for the late reply. Thank you for the extra information. I can confirm that IGraphAssignable TryStartSubGraph and TryStopSubGraph could be optimized. I will need some time to investigate what optimizations I could add to those. Thank you!
Hi Gavalakis, I’m curious if you have made any progress and would also like to reach out by saying that if I’m more than happy to test any potential performance improvements using my scenario.
Hey! Unfortunately, I haven’t worked on IGraphAssignable TryStartSubGraph and TryStopSubGraph yet.
Join us on Discord: https://discord.gg/97q2Rjh
Hi Gavalakis, I’m having a very similar problem to timheijde’s:
I’m working on a strategy game in which up to 400 units should be alive, moving, attacking, etc. I have performance issues at 150-200 entities, so I decided to delve into profiler. Here’s the test I did with 400 entities, considering the significant overhead due to the deep profiler:
In file 1 you can see the simpler Profiler picture. Inside my BT tasks there are two expensive operations (Physics.OverlapSphere and distance calculations). The rest are simple checks and assignations, so it seemed initially weird that OverlapSphere being the most expensive operation, it only accounted for ~20% of the processing time:
In file 2 you can see a deep profiler screenshot. I understand here that Node.Reset() is just handling the subgraph. If so, would it help to delete the subgraph and copy the tasks into all the BTs that implement them instead?
Here you can see Node.Reset() in depth:
However, leaving aside the Node.Reset() issue, I also have troubles with Node.Execute(). From the 35 ms that it takes, I believe that only the half of it (~16ms) are attributable to the code within the Condition and the Actions themselves.
So I’m wondering, is there anything I can do from NodeCanvas’ perspective to improve performance? I have already in mind an alternative to OverlapSpehere, but I’m afraid it won’t be enough if I can’t find a way to trim the BT’s handling too.
Thank you so much in advance
I couln’t add a 5th file to my previous message, here’s the most complex BT that I’m implementing for context:
Hello and sorry for the late reply!
It will certainly help if you place the contents of the subgraph in the main graph, since the performance hit comes from reseting the subgraph (which would matter especially if that is happening very frequently). Speaking of reseting the subgraph, it seems that the hit comes from removing itself from the running graphs list. I will optimize this in the next version update (probably remove the list altogether since it is not that critical to keep a reference to all running graphs anyway).
Regarding the last profiler image, please note that the profiler tree is open at the first Node execution therefore the 35ms correspond to the whole tree starting from the root Selector for all active trees, thus it includes all actions and conditions (as well as the subtree which seems to be executing at that point in time in the profiler). With that said you brought to my attention some things that can and will be optimized (starting the the subtree reset :), as well as a few other things) for the next version.
Let me know,
Thank you!
Join us on Discord: https://discord.gg/97q2Rjh
hi have you worked on IGraphAssignableExtensions ? they create huge spikes in my case that i used behaviour tree and fsm togheder i cant put then all in one graph ;-(.
Hello there,
The performance hit of ‘CheckInstance’ should be only once when the sub-graph is executed for the first time (and that is also when it initializes). Can you please confirm that? You can also enable “Pre-Initialize SubGraphs” in the inspector of the FSMOwner of BehaviourTreeOwner respectively. That will pre-initialize the subgraphs presented in the root graph of that FSMOwner or BehaviourTreeOwner respectively.
Let me know.
Join us on Discord: https://discord.gg/97q2Rjh