The Vram factor is a well known issue and each user must do his math about the scene sizes he is usually going to work on, before moving to gpu rendering.
If the Vram is enough though, the gpu rendering speed usually scales linearly when adding more gpus, with the exception of some renderers of course (I think, Blender Cycles gpu is one of them).
As I’ve already mentioned, Redshift is one of the gpu renderers that can make use of the system’s RAM if the Vram is not enough. I don’t know if this works with some latency penalties, to be honest. Maybe some Redshift users here in the forums could shed some light on this issue. It has been announced the Blender 2.8 will have this feature too (bypassing Vram and using System’s RAM when necessary). As a Blender user, I’m looking forward to this with great interest.
Viewport performance is usually considered to be “tightly” dependent on gpu performance, but my small experience shows that this is not entirely true. Especially with C4D, the OpenGL test of the well known Cinebench Benchmark, shows that fps have to do with core numbers and overall cpu performance too. I’ve changed various cpus in my personal rig during the last 4-5 years and had the opportunity to test them with the same gpus and see the differences. Yes, upgrading the gpu resulted in fps increase, but overclocking the cpu did the same too. Recently, I discovered that higher core numbers had a positive impact in fps too, and this was something I couldn’t ever imagine before trying it myself. This review also reveals the same behaviour https://www.pugetsystems.com/labs/articles/AutoDesk-3ds-Max-2017-GeForce-GPU-Performance-816/
I wish someone (maybe Srek, who works for Maxon) could give us some more information and feedback about these interesting phenomena.