The Viewport Integration Step, which is really what you are using to run your simulation, not the Render Step, was not accurate enough at only 1/8 frame, equivalent to 8 Subframe Samples. It was discovered early on that having the ability to add more substep calculations were needed when there were: really fast moving particles, a high number of collisions, then when glue was introduced later in beta that you needed a great deal more to ensure proper binding/breaking.
Truly the parameter name of Subframe is not accurate either, you are getting xNumber of Subframe samples per Integration Step. So in reality if you were patient enough you could get 1600 samples per frame! Ten samples per tick @30fps 
Excluding the Shape is mostly useful if you plan to exchange a low level of detail particle with high LOD particle. Is it certainly the largest portion of the cache file, somethings like the transform matrix, a vector location for every vertex of the shape, not totally sure which other data is associated with Shape off of the top of my head. The rest are just a few vectors, position, velocity, rotation, ect. IIRC.
Scrubbing a cache has always been a bit off. If you turn on the Particle View-> Options Menu->Track Update->Update Progress you will notice what operators/tests are being evaluated during playback. Try turning those operators off, except for Display of course, your particles will not be visible.
In a Cache Disk scenario only the Render, any Excluded Channel operators/test, and Display should be evaluating. That is of course assuming your Cache Disk is in the global/Source node. Local/Event nodes can have there own Cache Disk too in which everything downstream from the Cache Disk Event will get cached. Then Evaluation exclusion should start from the Cache Disk event downstream.