My application starts slowly, I want to preload it to avoid that problem. Should I be worried?
Well, in short, there are lots of concerns. Preloading things you may or may not need is a great way to waste a ton of memory and generally make the system less usable overall.
I’m often told that that the answer to a performance problem is to simply preload the slow stuff… unfortunately that doesn’t work as a general solution if everyone does it. It’s classic “improve the benchmark” thinking.
When developing for Windows you have to think about all kinds of scenarios, such as the case where there are several hundred users trying to share a server each with their own user session. Your application might also need to run in a very memory constrained environments like a small tablet or some such – you do not want to be loading extra stuff in those situations.
The way to make a system responsive is to KEEP IT SIMPLE. If you don’t do that, then it won’t matter that you’ve preloaded it -- when the user actually gets around to starting the thing in a real world situation, you will find that it has already been swapped out to try to reclaim some of the memory that was consumed by preloading it. So you will pay for all the page faults to bring it back, which is probably as slow as starting the thing in the first place. In short, you will have accomplished nothing other than using a bunch of memory you didn’t really need.
Preloading in a general purpose environment is, pretty much a terrible practice. Instead, pay for what you need when you need it and keep your needs modest. You only have to look at the tray at bottom right on your screen full of software that was so sure it was vitally important to you that it insisted on loading at boot time to see how badly early loading scales up.
Adding fuel to this already bonfire-sized problem is this simple truth: any application preloading itself competes with the system trying to do the very same thing. Windows has long included powerful features to detect the things you actually use and get them into the disk cache before you actually use them, whether they are code or data. Forcing your code and data to be loaded is just as likely to create more work evicting the unnecessary bits from memory to make room for something immediately necessary, whereas doing nothing would have resulted in ready-to-go bits if the application is commonly used with no effort on your part.
Bottom line, preloading is often a cop out. Better to un-bloat.
Huge words of caution: you can bury yourself in this kind of stuff forever and for my money it is rarely the way to go. It’s helpful to know where you stand on CPI for instance but it’s much more typical to get results by observing that you (e.g.) have a ton of cache misses and therefore should use less memory. Using less memory is always a good thing.
You could do meaningful analysis for a very long time without resorting to micro-architectural phenomena simply by studying where your CPU goes.
It is not only the case that (e.g.) ARM does things differently than (e.g.) x86 products, it is also the case that every x86 processor family you have ever heard of does it differently than every other one you have ever heard of. But that turns out to be not that important for the most part. Because the chief observations like “we branch too much” are true universally. Just as “we use too much memory” is basically universally true.
The stock observations that you should:
1. Use less memory
2. Use fewer pointers and denser data structures
3. Not jump around so much
Are essentially universally true. The question really comes down to what can you get away with on any given processor because its systems will save the day for you. But even that is a bit of a lie, because the next question is “what else could you be doing an your program would still run well?” because the fact is there is always other stuff going on and if you minimize your use of CPU resources generally you will be a better citizen overall.
In short, the top level metrics, CPU, Disk, Memory, Network, will get your very far indeed without resorting to mispredicts and the like. If you want to use the tools effectively, with broad results, I strongly recommend that you target the most important metrics, like L2 cache misses, and reduce them. That’s always good. Pay much less attention to the specific wall-clock consequence in lab scenarios and instead focus on reducing your overall consumption.
And naturally this advice must be tempered with focus on your customers actual problems and forgive me for being only approximately correct in 400 words or less.