I have been using this approach to do systematic analysis of performance regressions for several years now. I came up with it while looking at some tricky problems in Internet Explorer about three years ago and it’s served me well since then. The idea is pretty a simple one but it gives surprisingly good results in many cases.
I’ll be giving examples that talk about CPU as the metric but of course the same procedure works for any metric for which you can compute inclusive costs.
Nomenclature: Inclusive cost (e.g. time) is the cost in a method and everything it calls. Exclusive cost (e.g. time) is the cost from only the method itself, not counting anything it calls. Both are interesting but this technique really relies on inclusive cost.
Now the usual situation: You have some test that used to take say 50ms and now it takes 55ms. That’s a 10% growth. You want to know where to start looking and you’re fortunate enough to have a summary of costs from before and after. But there could be thousands of symbols and the costs could be spread all over the place. Also some symbols might have been renamed or other such inconvenient things. You could try staring at the traces in call-tree outlining but that gets very tedious especially if the call stacks are 50 levels deep or so. It’s when things get big and messy that having an analysis you can automate is helpful. So here’s how I do it.
First I consider only symbols that appear in both traces, that’s not everything but it’s a lot and is typically enough to give you a solid hint. For each symbol I know the inclusive cost in the base case and test case, from this I can compute the delta easily enough to tell me how much it grew. Now the magic. Since I know how much the overall scenario regressed (10% in this example) I can easily compute how much any particular symbol should have gotten slower if we take as our null hypothesis that “bah, it’s all just evenly slower because it sucks to be me” so we compute that number. So a symbol that had a previous cost of 10 in my example here should have a growth of 10% or a delta of 1. We compute the ratio of the actual delta to the observed delta and that is called the “overweight percentage” and then we sort on that. And then stuff starts popping out like magic.
I’ll have more examples shortly but let’s do a very easy one so you can see what’s going on. Suppose main calls f and g and does nothing else. Each takes 50ms for a total of 100ms. Now suppose f gets slower, to 60ms. The total is now 110, or 10% worse. How is this algorithm going to help? Well let’s look at the overweights. Of course main is 100 going to 110, or 10%, it’s all of it so the expected growth is 10 and the actual is 10. Overweight 100%. Nothing to see there. Now let’s look at g, it was 50, stayed at 50. But it was “supposed” to go to 55. Overweight 0/5 or 0%. And finally, our big winner, f, it went from 50 to 60, gain of 10. At 10% growth it should have gained 5. Overweight 10/5 or 200%. It’s very clear where the problem is! But actually it gets even better. Suppose that f actually had two children x and y. Each used to take 25ms but now x slowed down to 35ms. With no gain attributable to y, the overweight for y will be 0%, just like g was. But if we look at x we will find that it went from 25 to 35, a gain of 10 and it was supposed to grow by merely 2.5 so it’s overweight is 10/2.5 or 400%. At this point the pattern should be clear:
The overweight number keeps going up as you get closer to the root of the subtree which is the source of the problem. Everything below that will tend to have the same overweight. For instance if the problem is that x is being called one more time by f you’d find that x and all its children have the same overweight number.
This brings us to the second part of the technique. You want to pick a symbol that has a big overweight but is also responsible for a largeish fraction of the regression. So we compute its growth and divide by the total regression cost to get the responsibility percentage. This is important because sometimes you get leaf functions that had 2 samples and grew to 3 just because of sampling error. Those could look like enormous overweights, so you have to concentrate on methods that have a reasonable responsibility percentage and also a big overweight.
Below are some examples as well as the sample program I used to create them and some inline analysis.
Example 1, baseline
The sample program uses a simulated set of call-stacks and costs for its input. Each line represents a call chain and the time in that call chain. So for instance the first line means 5 units in main. The second line means 5 units in f when called by main. Together those would make 10 units of inclusive cost for main and 5 for f. The next line is 5 units in j when called by f when called by main. Main's total goes up to 15 inclusive, f goes to 10, and j begins at 5. This particular example is designed to spread some load all over the tree so that I can illustrate variations from it.
Example 2, in which k costs more when called by f
This one line is changed. Other appearances of k are not affected, just the one place.
Example 3, in which x always costs a little more
All the lines that end in x became 6 instead of 5. Like this:
Example 4, in which f calls j more so that subtree gains cost
All the lines under f/j got one bigger like so:
And finally example 5, in which x gets faster but k gets a lot slower
All the x lines get a little better:
But the k line got worse in two places
Let's see how we do with automated analysis of those things:
Summary of Inclusive times for example 2, in which k costs more when called by f
This gives us the baseline of 90 units for main and you can see how all the "5" costs spread throughout the tree.
You can see that k has gone up a bit here but not much. A straight diff would show you that. However there's more to see. Let's look at the first overweight report.Overweight Report
Before: example 1, baseline
After: example 2, in which k costs more when called by f
Before Time: 90
After Time: 95
Overall Delta: 5
Summary of Inclusive times for example 3, in which x always costs a little more
OK the report clearly shows that k is overweight and so is f. So that gives us a real clue that it's k when called by f that is the problem. And also it's k's exclusive cost that is the problem because all it's normal children have 0% overweight. Not that there is a clear difference between methods with otherwise equal deltas.
Our second example, again you could see this somewhat because x is bigger, but it doesn't really pop here. And many methods seem to have been affected. A straight diff wouldn't tell you nearly as much.
Before: example 1, baseline
After: example 3, in which x always costs a little more
Before Time: 90
After Time: 95
Overall Delta: 5
Well now things are leaping right off the page. We can see that x was the best source of the regression and also that l and k are being implicated. And f and k are bearing equal cost. We can also see that some branches are underweight. The j path is affected more than the k path because of the distribution of calls.Summary of Inclusive times for example 4, in which f calls j more so that subtree gains cost Symbol Inclusive Cost Exclusive Cost main 94 5 f 49 5 j 44 11 g 40 0 k 30 10 x 26 26 y 16 16 z 16 16 l 10 5
Again a straight analysis with so few symbols does evidence the problem, however, it's much clearer below...Overweight Report
Before: example 1, baseline
After: example 4, in which f calls j more so that subtree gains cost
Before Time: 90
After Time: 94
Overall Delta: 4
Summary of Inclusive times for example 5, in which x gets faster but k gets a lot slower
The J method is the worst offender, y and z are getting the same impact due to extra calls from j and j apparently comes from f.
Now we have some soup. It is worse but things are a bit confused. What's going on?
Before: example 1, baseline
After: example 5, in which x gets faster but k gets a lot slower
Before Time: 90
After Time: 105
Overall Delta: 15
Now again things are a lot clearer. Those negative overweights are showing gains where there should be losses. x is helping. And k jumps to the top with a big 360%. And it's 120% responsible for this mess, meaning not only did it cause the regression it also wiped out gains elsewhere.
In practice negatives are fairly common because sometimes costs move from one place to another. Sometimes because of normal things like, in IE, a layout could caused by a timer for paint rather than caused by an explicit request from script, but we still get one layout, so the cost just moved a bit. The overweights would show nothing new in the layout space but a big motion in timer events vs. script cost.
In practice this approach has been very good at finding problems in deep call stacks. It even works pretty good if some of the symbols have been renamed because usually you'll find some symbol that was just above or below the renamed symbol as your starting source for investigation.
Finally you can actually use this technique recursively. Once you find an interesting symbol ("the pivot") that has a big overweight, you regenerate the inclusive costs but ignore any stacks in which the pivot appears. Search for new interesting symbols in what's left the same way and repeat.
The code that generated these reports is here.
As an afterthought I ran an experiment where I did the "recursion" on the last test case. Here are the results:
Note k is gone.Summary of Inclusive times for example 6, in which x gets faster and k is removed Symbol Inclusive Cost Exclusive Cost main 57 5 j 38 10 f 33 5 g 19 0 x 12 12 y 10 10 z 10 10 l 9 5
Note k is goneOverweight Report
Before: example 6, baseline with k removed
After: example 6, in which x gets faster and k is removed
Before Time: 60
After Time: 57
Overall Delta: -3
Overweight analysis leaves no doubt that x is responsible for the gains.
RecapI’m only up to 1980, that’s pretty amazing considering what’s happened in the story so far. The Altair 8800 made its big splash in the January edition of Popular Electronics (which naturally came out in December). Now it’s 1980, only a half-decade later, and we’ve gone from that barely-there device to things even my 2014 eyes would recognize as actual personal computers. Visicalc is available on the Apple II and is changing the way people think about their data, it would soon find its way to the CBM family of machines. A not-especially-wealthy person could reasonably afford to buy a computer, a good letter-quality printer, and all the storage they could stand (in the form of lotsa floppy disks) and even have several choices and different price points that would meet those criteria. Database programs were starting to be a thing but I’d have to say that until DBASE came along later things were pretty much hit-or-miss and there was no clear tool-of-choice.
That amount of progress is just astonishing. One thing that hadn’t changed too much was processor speed. During most of this time machines ran at about 1Mhz and that would remain the case for some time yet. Another thing is the availability of general purpose hi-res graphics. Now it’s not that we didn’t call things hi-res but Apple II’s notion of hires graphics for instance sported a whopping 280x192 pixels with a not-fully-general color display system (which you can read about elsewhere if you really like). Not-fully-general would be pretty typical for some time. Probably until CGA graphics of the PC, which was still a good 18 months away.
Notes On Graphics
I think there’s a pretty simple reason why this was still hard, If you consider a typical good-quality screen at the time, you get about 40 characters by 25. Not too much later 80 column displays become available (wow is that roomy! No more program wrapping!) but I think 40 columns is more fair at this time. OK 40 columns, typically 8 pixels by 8 pixels in a cell. So 320x200. That’s nearly 64k bits or 8k to store all those pixels. Well for starters 8k is a lot of RAM in 1980 but almost as important we have to read the RAM 60 times a second and that gives us about a 480kB/s bandwidth problem – challenging on a memory bus of the day which is 1Mhz and that memory has to dual port. And that was assuming 1 bit per pixel. To get even 4 colors total (CGA quality) you need 2 bits and 16k -- that was pretty much not happening.
On the other hand, downsizing was about to happen in a big way. Err, a small way. If the PET was austere the VIC20 was downright Spartan. At 5k of memory, with only 3.5k available for BASIC you couldn’t fit much code in the thing. But with crazy-double height character mapping you could tile every pixel – at 22x23 you had 506 characters, requiring exactly that many bytes for the storage plus there was a side table of colors for the meaning of the 1 and 0 bits in the main table. Of course there were not quite enough characters to cover the screen like that, but 160 by 160 was possible with 200 double height characters.
VIC20 was wildly popular at $300, not just because of its built in capabilities but because its expansion port accepted a variety of cartridges which could then bring additional capabilities. For starters even a simple game cartridge would have the game in question on ROM so that meant you didn’t need to use any of that precious 3k to store the program. But you could get a whopping 16k cartridge for the thing and have VAST amounts of memory for BASIC programming. The keyboard was totally reasonable – the whole thing looked like a thick-ish keyboard, and there was a custom low-end serial port that was kind of like IEEE488 over serial. And plenty slow. But it worked and it was cheap. Notably VIC20 was the first computer I know with a modem for less than $100. Bucketfulls of those sold as well and Compuserve, The Source, and others suddenly had a huge influx of users. Over a million modems sold.
One of the great things about having a device this inexpensive was that it could be used in a variety of custom applications straight-out-of-the-box instead of deploying custom hardware. In this time at my work we were experimenting with a small breadboard we called the Universal Interface which basically was a 6502 and some standard IO parts for it (a 6522ish thing) plus room for ROM and a breadboard for whatever else we needed and an IEEE488 port that could be repurposed if needed. We’d load it up and it would serve for converting whatever inputs to whatever outputs, usually IEEE488 so a PET could read it. But when space wasn’t a factor, or when video was desired, you could actually deploy a VIC pretty economically, and people did.
Speaking of small computers though, the VIC20 may seem itty bitty to your eyes but it’s downright luxurious compared to the champion of minimalist design – the Sinclair ZX80. This baby featured a swell Z80 processor and sold at $99. (Today I can get a very nice tablet for the same price) It had 1k of memory, and it had very limited video hardware – leaving the heavy lifting to the CPU. At 32x24 characters if you displayed everything you’d only have 384 bytes to work with. Yowsa. In the ZX81 would let your code would run very slowly when displaying video as it could only dedicate cycles to your program during the vertical blank. The ZX80 didn’t even do that, so real time graphics weren’t really an option. But wow, what a slick trick!
So at this point in history, we have the VIC, PET, Apple, TRS80-CoCo which I barely mentioned, ZX80, and maybe some other lesser known ones. We’re 18 months away from the PC and 4 years away from the Mac. By volume VIC20 will dominate, defining home computing for 2 years for many, whilst other offerings are actually pretty much universally superior to an unexpanded VIC. For comparison, something north of 2000 Altair 8800s were sold. A half decade later VIC20 was selling 9000 units a day for about the same price as the 8800 kit.
Even William Shatner got in on the deal…
Use Symbol Filtering to get symbols you care about from your server instead of getting the kitchen sink
One of the most annoying things about working with performance traces is that they include information about everything in the system. That's also one of the great things.
However, most of the time, for most problems, there are very few things you are specifically looking for. Like in my case I'm looking at IE problems and I already know all the IE dlls. If the dude happens to have 82 other programs running and idle I do not need to download their symbols. There is no easy way to specify the symbols I care about so I wrote this local symbol proxy. Beware it's slightly cheesy but it's super helpful because you neither have to download or cache symbols you don't care about. I haven't tested it extensively but it works for me. If you find bugs, maybe you can stick it on a github somewhere and start maintaining it for reals. I feel like I've done my part :)
The source code is here.
As usual it comes with no warrantee and no rights are implied etc. It's just a quick sample I threw together.
To use it, create your own symfilter.txt file with one pattern on each line. It expects to find in the current directory when you start it. Then run the thing. Any requests that do not contain your strings will get a 404 return. Everything else will be forwarded to the indicated server for resolution.
If your server path used to be
change it to
The bit before the | is used to find the actual server, the request will be changed to be http://someserver/somepath/...whatever
You could imagine having several configuration files listening on several ports but I leave that as an exercise to the reader. This thing saves me a ton of time by not fetching symbols I don't need.
from your server
I wrote a message much like the below yesterday and I kept cracking up once I had started. And nobody around me had the relevant context to understand why I thought it was so funny. I’ve removed all the specifics and left the core. I know some of you will get it :)
We have three branches, C pushes to B, and B pushes to A. B only accumulates changes from C, nothing else happens in B. We have a performance regression observed in all the branches and we’re wondering where it came from. Who to blame? Should be easy right?
Branch C was good on the 15th and bad on the 25th. It pulled no payload from B during that time so clearly I cannot choose B.
Branch B was good on the 16th and bad on the 24th, but C did not push to B during that time so clearly I cannot choose C.
Branch A was good on the 16th and bad on the 23rd but the 16th already had the most recent push from B, so I cannot choose B.
The main version number comes from A and it didn’t change when B went bad so clearly I cannot choose A.
Have you made your decision?
Not remotely :)
There were some requests for an example of my unit testing strategy so made up this fragment and included some things that would make your testing annoying.
This is the initial fragment. Note that it uses annoying global methods that complicate testing as well as global state and system calls that have challenging failure conditions.
HANDLE hMutex = NULL;
void DoWhatever(HWND hwnd)
if (hMutex == NULL)
hMutex = ::CreateMutex(NULL, FALSE, L"TestSharedMutex");
if (hMutex == NULL)
DWORD dwWaitResult = WaitForSingleObject(hMutex, 1000);
BOOL fRelease = FALSE;
LPWSTR result = L"Some complicated result";
::MessageBox(hwnd, result, L"Report", MB_OK);
fRelease = TRUE;
::MessageBox(hwnd, L"MutexAquired via Abandon", L"Report", MB_OK);
fRelease = TRUE;
::MessageBox(hwnd, L"Mutex became invalid", L"Report", MB_OK);
fRelease = FALSE;
::MessageBox(hwnd, L"Mutex acquisition timeout", L"Report", MB_OK);
fRelease = FALSE;
Now here is basically the same code after the transform I described in my last posting. I've added a template parameter to deal with the globals and I've even made it so that the system type HWND can be changed to something simple so you don't need windows.h
template <class T, class _HWND> void DoWhateverHelper(_HWND hwnd)
if (T::hMutex == NULL)
T::hMutex = T::CreateMutex(NULL, FALSE, L"TestSharedMutex");
if (T::hMutex == NULL)
DWORD dwWaitResult = T::WaitForSingleObject(T::hMutex, 1000);
BOOL fRelease = FALSE;
LPWSTR result = L"Some complicated result";
T::MessageBox(hwnd, result, L"Report", MB_OK);
fRelease = TRUE;
T::MessageBox(hwnd, L"MutexAquired via Abandon", L"Report", MB_OK);
fRelease = TRUE;
T::MessageBox(hwnd, L"Mutex became invalid", L"Report", MB_OK);
fRelease = FALSE;
T::MessageBox(hwnd, L"Mutex acquisition timeout", L"Report", MB_OK);
fRelease = FALSE;
Now we make this binding struct that can be used to make the template class to do what it always did.
static HANDLE CreateMutex(LPSECURITY_ATTRIBUTES pv, BOOL fOwn, LPCWSTR args)
return ::CreateMutex(pv, fOwn, args);
static void ReleaseMutex(HANDLE handle)
static void MessageBox(HWND hwnd, LPCWSTR msg, LPCWSTR caption, UINT type)
::MessageBox(hwnd, msg, caption, type);
static DWORD WaitForSingleObject(HANDLE handle, DWORD timeout)
return ::WaitForSingleObject(handle, timeout);
static HANDLE hMutex;
This code now does exactly the same as the original.
void DoWhatever(HWND hwnd)
And now I include this very cheesy Mock version of the template which shows where you could put your test hooks. Note that the OS types HWND and HANDLE are no longer present. This code is OS neutral. LPSECURITY_ATTRIBUTES could have been abstracted as well but I left it in because I'm lazy. Note that HANDLE and HWND are now just int. This mock could have as many validation hooks as you like.
static int CreateMutex(LPSECURITY_ATTRIBUTES pv, BOOL fOwn, LPCWSTR args)
// validate args
static void ReleaseMutex(int handle)
// validate that the handle is correct
// validate that we should be releasing it in this test case
static void MessageBox(int hwnd, LPCWSTR msg, LPCWSTR caption, UINT type)
// note the message and validate its correctness
static DWORD WaitForSingleObject(int handle, DWORD timeout)
// return whatever case you want to test
static int hMutex;
In your test code you include calls that look like this to run your tests. You could easily put this into whatever unit test framework you have.
void DoWhateverMock(int hwnd)
And that's it.
It wouldn't have been much different if we had used an abstract class instead of a template to do the job. That can be easier/better, especially if the additional virtual call isn't going to cost you much.
We've boiled away as many types as we wanted to and we kept the heart of the algorithm so the unit testing is still valid.
[I added this example in a later post]
There are lots of pieces of code that are embedded in places that make it very hard to test. Sometimes these bits are essential to the correct operation of your program and could have complex state machines, timeout conditions, error modes, and who knows what else. However, unfortunately, they are used in some subtle context such as a complex UI, an asynchronous callback, or other complex system. This makes it very hard to test them because you might have to induce the appropriate failures in system objects to do so. As a consequence these systems are often not very well tested, and if you bring up the lack of testing you are not likely to get a positive response.
It doesn’t have to be this way.
I offer below a simple recipe to allow any code, however complex, however awkwardly inserted into a larger system, to be tested for algorithmic correctness with unit tests.
Take all the code that you want to test and pull it out from the system in which it is being used so that it is in separate source files. You can build these into a .lib (C/C++) or a .dll (C#/VB/etc.) it doesn’t matter which. Do this in the simplest way possible and just replace the occurrences of the code in the original context with simple function calls to essentially the same code. This is just an “extract function” refactor which is always possible.
In the new library code, remove all uses of ambient authority and replace them with a capability that does exactly the same thing. More specifically, every place you see a call to the operating system replace it with a call to a method on an abstract class that takes the necessary parameters. If the calls always happen in some fixed patterns you can simplify the interface so that instead of being fully general like the OS it just does the patterns you need with the arguments you need. Simplifying is actually better and will make the next steps easier.
If you don’t want to add virtual function calls you can do the exact same thing with a generic or a template class using the capability as a template parameter.
If it makes sense to do so you can use more than one abstract class or template to group related things together.
Use the existing code to create one implementation of the abstract class that just does the same calls as before.
This step is also a mechanical process and the code should be working just as well as it ever did when you’re done. And since most systems use only very few OS features in any testable chunk the abstract should stay relatively small.
Take the implementation of the abstract class and pull it out of the new library and back into the original code base. Now the new library has no dependencies left. Everything it needs from the outside world is provided to it on a silver platter and it now knows nothing of its context. Again everything should still work.
Create a unit test that drives the new library by providing a mock version of the abstract class. You can now fake any OS condition, timeouts, synchronization, file system, network, anything. Even a system that uses complicated semaphores and/or internal state can be driven to all the hard-to-reach error conditions with relative ease. You should be able to reach every basic block of the code under test with unit tests.
In future, you can actually repeat these steps using the same “authority free” library merging in as many components as is reasonable so you don’t get a proliferation of testable libraries.
Use your code in the complex environment with confidence! Enjoy all the extra free time you will have now that you’re more productive and don’t have bizarre bugs to chase in production.
I could spend a long time writing about programming the PET and its various entry points, and I’m likely going to spend disproportionate time on the CBM family of computers because that’s what I know, but I think it’s important to look at other aspects of microcomputers as well and so my sojourn into 6502 assembly language will have to be cut short. And anyway there’s room for programming examples elsewhere.
To make a decent microcomputer you need to solve certain supplemental problems… so this is the Peripherals edition of this mini-history.
Now here I’m really sad that I can’t talk about Apple II storage systems. But I can give you a taste of what was possible/normal in 1979. Tapes. Tapes my son, lots of tapes. Short tapes, long tapes, paper tapes, magnetic tapes, and don’t forget masking tape – more on that later.
Many computers (like the KIM) could be connected to a standard cassette player of some kind, the simplest situation just gave you some kind of connector that would provide input and output RCA jacks and you bring your own cassette player.
Paper type was also used in some cases, in those the paper tape insertion would effectively provide the equivalent of keystrokes on some TTY that was connected via say RS232 (and I say that loosely because usually it was just a couple of pins that behaved sorta like RS232 if you crossed your eyes enough). Likewise paper tape creation could be nothing more than a recording of printed output which was scientifically created so as to be also be valid input! If that sounds familiar it’s because the same trick was used to provide full screen editing on PET computers – program listings were in the same format as the input and so you could just cursor up there and edit them some and press enter again.
OK, but let’s be more specific. The PET’s tape drive could give you about 75 bytes/sec, it was really double that but programs were stored twice(!), for safety(!!), which meant that you could fit a program as big as all the available memory in a 32k PET in about 10 minutes of tape. Naturally that meant that additional tape would just create fast forward nightmares so smaller tapes (and plenty of them) became somewhat popular. I must have had a few dozen for my favorite programs. Also backups were good because it got cold in Toronto and magnetic tape was not always as robust as you might like. Plus you could rewind one with a pencil and it wouldn’t take so long, always a plus.
But the real magic of the PET’s tape was that the motor was computer controlled. So if you got a big tape with lots of programs on it, it often came with an “index” program at the front. That program would let you choose from a menu of options. When you had selected it would instruct you to hit the fast forward button (which would do nothing) and strike a key on the pet. Hitting the key would then engage the fast forward for just the right amount of time to get you to where the desired program was stored on the tape and the motor would stop! Amazing! What a time saver!
The timelines for other manufacturers is astonishingly similar, it seems everyone decided to get into the game in 1977 and things developed very much in parallel in all the ecosystems. Apple, and Radio Shack were highly harmonious schedules.
But what about disk drives, surely they were a happening thing? And indeed they were. On the Commodore side there were smart peripherals like the 2040 and 4040 dual floppy drives. Now they pretty much had to be that way because there was so little memory to work with that if you had to sacrifice even a few kilobytes to a DOS then you’d be hurting. But what smarts, here’s what you do when you insert a new floppy
open 1,8,15: Print #1, “I0”
or you could get one free command in there by doing
And then use print for new commands. To load a program by name simply do this:
and then you can run it same as always.
But how do you see what’s on your disk? Well that’s easy, the drive can return the directory in the form of a program, which you can then list
And there you have all your contents. Of course this just wiped your memory so I hope you saved what you had…
Well, ok, it was a total breakthrough from tape but it was hardly easy to use, and the directory thing was not really very acceptable. But fortunately it was possible to extend the basic interpreter… sort of. By happenstance, or maybe because it was slightly faster, the PET used a tiny bit of self-modifying code to read the next byte of input and interpret it. You could hack that code and make it do something other than just read the next byte. And so were born language extensions like the DOS helper. Now you had the power to do this:
To initialize drive zero, and,
To print the directory without actually loading it! Amazing!
Could be used instead of the usual load syntax.
From a specs perspective these 300 RPM babies apparently could do about 40 KB/s transfer internally but that slowed down when you considered the normal track-to-track seeking and the transfer over IEEE488 or else the funky serial IEEE488 of the 1541. I think if you got 8KB/s on parallel you’d be pretty happy. Each disk stored 170k!
Tapes soon gave way to floppies… and don’t forget to cover the notch with masking tape if you don’t want to accidently destroy something important. It was so easy to get the parameters backwards in the backup/duplicate command
Mean duplicate drive 1 from drive 0 but it was best remembered Destroy 1 using 0.
Suffice to say there has been a lot of innovation since that time.
It certainly wasn’t the case that you could get cheap high-quality output from a microcomputer in 1977 but you could get something. In the CBM world the 2022 and 2023 were usable from even the oldest pet computers and gave you good solid dot matrix quality output. By which I mean very loud and suitable for making output in triplicate.
Letter quality printers were much more expensive and typically not in anything like an interface that was “native” to the PET. I think other ecosystems had it better. But it didn’t matter, the PET user port plus some software and an adapter cable could be made centronics compatible or a different cable and you could fake RS232 on it. That was enough to open the door to many other printer types. Some were better than others. We had this one teletype I’ll never forget that had the temerity to mark its print speeds S/M/F for slow, medium, and fast – with fast being 300 baud. Generously, it was more like very slow, slow, and medium – or if you ask me excruciatingly slow, very slow, and slow. But this was pretty typical.
If you wanted high quality output you could get a daisywheel printer, or better yet, get an interface that let you connect a daisywheel typewriter. That’ll save you some bucks… but ribbons are not cheap.
They still get you on the ink.
With these kinds of devices you could reasonably produce “letter-quality” output. But what a microcosm of what’s normal the journey was. Consider the serial protocol: 7 or 8 bits? parity or no? odd or even? Baud rate? You could spend a half hour guessing before you saw anything at all. But no worries, the same software to talk to a TRS-80 Votrax synthesizer and speak like you’re in Wargames.
Now I call these things printers but you should understand they are not anything like what you see today. The 2023 for instance could not even advance the page without moving the head all the way from side to side. Dot matrix printers came out with new features like “bi-directional” meaning they could print going left to right and then right to left so they weren’t wasting time on the return trip. Or “logic seeking” meaning that the printer head didn’t travel the whole length of the printed line but instead could advance from where it was to where it needed to be on the next line forwards or backwards. A laser printer it ain’t.
Double-density dot matrix for “near-letter-quality” gave you a pretty polished look. 132 character wide beds were great for nice wide program listings but options were definitely more limited if you were not willing to roll your own interface box.
Still, with a good printer you could do your high school homework in a word processor, and print it in brown ink on beige paper with all your mistakes corrected on screen before you ever wrote a single character.
So much for my Brother Electric. Thanks anyway mom.
I started writing this several years ago, never finished it... stumbled across it just now and I thought maybe if I post this I'd be motivated to write more.
This is of course just my perspective and it's probably wrong in places, but it is my perspective. So there you go. Lots of fun memories. Hope you enjoy.
[You can also tweet me @ricomariani, cheers.]
A Personal History of Microcomputing
I can’t possibly cover this topic in anything like a detailed way and so you may ask what I’m doing trying to write down this little paltry bit of history. Well for one thing I’m not going to be able to remember it forever and inasmuch as this is my history I’d like to record it before I forget. But that’s not usually a good enough reason for me to do anything so I should add that another reason, perhaps the main reason, is that I’m so very, very, tired of hearing other made up histories that forget so much of what happened, even fairly simple things, and that attribute important bits of progress to all the wrong people.
So while I can’t write about everything I can write about some things, some things that I saw and even experienced myself. And I hope that some of those things are interesting to others, and that they remember, too.
The first things I remember
I’m writing this on 11/1/2012 and I’m about to try to go back in time to the first relevant memory I have of this industry. I’m fairly sure it was 1975, 5th grade for me, and I picked up a book from our school library that was called “Automatic Data Processing” or maybe it was “Automated Data Processing”. I checked that book out of the library at least 3 times. I never read much of it. I’m not sure what it was doing in a grade school library. I remember one cool thing about it was that it had a little decoder chart for the unusual symbols written at the bottom of personal checks. I know that I tried to read it cover to cover but I didn’t have much success. I guess I shouldn’t be too astonished, I was only 10 years old at the time.
The reason I bring this up is that in many ways this was what computer science was like at the time. It wasn’t exactly brand new but it was perhaps the province of very large businesses and governments and there wasn’t very much that was personal about it, except for those magnetic ink markings on personal checks.
I did not know then that at about the same time in 1975 a small company called Microsoft was being founded. I did not know that Intel had produced some very interesting silicon chips that would herald the first microcomputer. I don’t think anyone I knew had a Pong game. I had a Brother personal electric typewriter which was a pretty cool thing to have and that was the closest to word processing that I had ever experienced. I didn’t use anything like white-out on account of I couldn’t afford it.
I was a lot more concerned about the fact that Canada was going to adopt the metric system than I was about any of these things. The computer technology on Star Trek (which I saw in reruns) and The Six Million Dollar Man seemed equally reasonable to me. I wasn’t old enough to think Erin Moran of Happy Days (Joannie) was really cute but I soon would be. That’s 1975.
People Start Experiencing Computers
If you ever saw an original MITS Altair 8800 you would be really not impressed. I mean so seriously not impressed that even McKayla Maroney could not adequately capture this level of unimpressedness (but I know someone who could :)). If I had to pick, in 1975, goofing around with an Altair and vs. playing around with a hand-wound copper armature suspended on a couple of nails to make a motor, the motor would win every time. And I think more important than that, a few thousand people actually experienced the Altair. The Altair did not take North America or the world by storm. In fact you could live your life just fine and not be aware of their existence at all and that is certainly where I was.
However, there were lots of things starting to become normal and even common that were foreshadowing the personal computer.
I think I first noticed it happening in watches. You remember the kind, well there were two kinds really, the first kind was the type where you had to push a button and an LED display would then tell you what time it was? This was important because of course you couldn’t have the LED display on all the time as it would run down the battery far too quickly. Which meant glancing at your watch was right out -- you needed that button. I think there was a commercial where a fellow was fencing and trying to see what time it was and it wasn’t going so good because he had to push the button. I’m not sure why you would need to know what time it was while fencing but it did make the point dramatically that you had to push a button.
The other type of watch was LCD, and I suppose the fact that you can still get LCD watches today and not so much LED (but they are making a comeback as flashlights) speaks volumes. These devices had rudimentary features that allowed them to do their jobs. They were not in any way generally programmable, at least not by end-users. But groundwork was being laid. You could do an entire volume on just wearable computers.
I only knew one person with an LED watch, but I knew a lot of people that had assorted LED games. You might want to stop and think about that. We’re talking about a game here where the primary display is a few 8 segment LED clusters, the same as you found on calculators and such. These games were very ambitious indeed claiming to be things like football simulations. An Xbox 360 these are not and Madden Football was many, many, years away. But somehow the up-and-down dodge-the-barely-moving-bad-guys football experience, punctuated by the crisp sound of what had to be piezo-electric crystal powered sound was pretty impressive. As was the number of batteries you had to sacrifice to the thing. Now to be sure I never took one apart and I wouldn’t have known what a piezo-electric speaker was at the time anyway but I’d bet you a nickel those games were powered by a simple microprocessor and some ROM. They made more of a cultural dent than the Altair. And they were more accessible than say Pong, which was present but hardly ubiquitous itself.
I’ve completely glossed over calculators at this point. And maybe rightly so; even a four-function-doorstop of a calculator with no “memory” function was sufficiently expensive that you were unlikely to encounter them.
And much was I was down on the Altair, within a few years, another Intel based computing system would become much more popular – Space Invaders. Which for many of us was the first “non-pong-like game” they ever experienced or were inspired by.
In summary, I think it’s fair to say that at this point, the late seventies, you could still be excused if you had never touched anything like a microcomputer. But that was likely to change soon.
My First Computers
I was taking an math enrichment program in junior high school and though our junior high didn’t have a computer, there was this HP minicomputer that made the rounds. I’m not sure what it was exactly but I’ve looked at some pictures and specifications and I’m pretty convinced that it was an HP Model 9830A with the thermal printer and card-reader options. We mostly fed it cards even though it had a console. The thing was at our school for an entire two weeks, and we spent the week before it arrived learning flowcharting and BASIC.
I was totally hooked. I stayed late at school every day the thing was there and then kept writing programs I couldn’t even run anywhere on paper pads the whole rest of the year. So in the 9th grade I made a major right turn career wise and I signed up for computer science classes in high school which I otherwise likely would not have done.
As I started 10th grade in the fall of 1979, I landed in a classroom that had three, count’em, Commodore PET microcomputers and one Radio-Shack TRS80. I’m not sure why the “Trash-80” was unpopular, it’s actually a formidable device in own right but for reasons lost to history the PETs were what everyone really used. I knew this was going to be a cool class when I walked in because I’d seen a PET on “The Price Is Right” so it had to be awesome. I still remember my first day looking at one of those things, the teacher had it up front and was reviewing some basic commands and I was hypnotized by the flashing cursor.
I worked on those computers at great length so I can tell you something about them and maybe something about the software eco-system they had. The “PET 2001” was powered by a 6502 processor and had 8k of RAM (with famously 7167 bytes free for you at startup), and 8k of ROM for BASIC and I/O support. Plus another 1k of RAM for video memory. The IO system was not especially complicated, like most of the era it was just memory mapped IO and it was enough to read in a keyboard, and talk to the built-in cassette tape drive. There was support for IEEE488 in the initial version but it didn’t work due to bugs, except for this one printer which included built in workarounds for the bugs. IEEE488 “actually worked” in the 2001N series.
However, even on that original 8k PET you could do some pretty cool things. The ROM included built in BASIC and so there were a variety of games and it was not that hard to make your own, and we did. I was perennially working on a version of Space Invaders and then Asteroids. It was always mostly working. There were dozens of race track style games, some even first person. There was a cool monthly digital magazine, “CURSOR” that had something new and slick every issue. I remember hacking on the model rail simulator with special zeal. There were decent chess programs, and even more, less decent chess programs available if you were willing to type them in yourself.
But what about practicality? Even that computer, such as it was, could do word processing for you. By the time I started using them, WordPro3 was already available. I think they even had features that allowed you to print while you were still making edits! Amazing! You could insert new lines where you pleased and even copy text from one place to another without requiring you to travel forward in time to the Macintosh era. In fact every microcomputer worth mentioning, with anything like a general purpose display, could do these basic functions. They certainly were not peculiar to the PET.
If you wanted high quality sound, why, naturally you would attach a breadboard with about a dozen suitably sized resistors and an OP-AMP to your parallel port and then you could mix 4 sources and enjoy high quality 8-bit digital to analog sound playback. Your experience is limited only by the quality of your resistors! Naturally your playback program included 6 bit precision wave tables for sine waves that you could sample/mix to get your four voices because none were built in. For bonus points add an FM modulator to your circuit and you could listen to it on your FM-radio at the frequency of your choice instead of attaching a speaker. Of course stereo wasn’t possible on account of there weren’t enough output pins on the parallel port for 16 bits.
Of course if you wanted to just hear some variable pitch “beeping” and make music like that, that was easier. You could just crank up the shift rate on the output port designed to be part of a UART (the CB2) and vary the shift rate according to the music. The preferred way to hear that was to attach an alligator clip to the port with electric tape on the bottom teeth so as to not short it out (because the signal was on top and connectors were far too expensive) and then connect that to a suitable speaker. This technique was popular in games because it didn’t tie up the processor shifting out waveforms.
My electronics teacher had an even simpler 6502 computer system that became the first thing I ever brought home. The KIM-1 came with an impressive array of books on the 6502 architecture, which I was especially excited about because I wanted to learn to program the PET in Machine Language (the capitals were evident when we said it) and of course they had the same microprocessor. But the really cool thing was Jim Butterfield’s “The First Book of KIM” which was simply outstanding in terms of having cool little programs that did something and taught you something.
The KIM had a display that consisted of six 7-segment LEDs. That’s it. Enough to show the address and contents of a single byte of memory in hexadecimal. On that display you could play a little pong type game, Hunt the Wumpus, navigate a star field, simulate a lunar landing, and more… if you were willing to enter the programs with the little calculator tablet. With only 1k of memory you could be on a first-name basis with every byte of your program and indeed you pretty much had to be. But then, that was the point. And the KIM’s exposed guts encouraged even more hardware hacking than the PET did, so we soon had interesting keyboards attached and more.
Still, I don’t recall ever doing anything especially practical on the device, it was a lot of elaborate playing around. It was an excellent training area and I suppose that’s what it was designed for more than anything else so I shouldn’t be surprised.
The 6502 training was useful and soon I was squeezing programs into the spare cassette buffer of the PET like a champ. Hybrid BASIC and assembly language programs were pretty common, whereas full assembly language programs often had nothing more than an enigmatic listing
The hybrids often had little SYS 826 and friends sprinkled in there. So while a working knowledge of machine language helped you to understand a few more snippets of PETTREK, really the more interesting thing you could do with it is learn a lot more about how your computer worked.
Remember the PET had only 8k of ROM, which was actually a lot compared to its cousins, but still not so much that you couldn’t disassemble every last byte and then start pretending to be the CPU starting from the reset vector. From there it wasn’t too long until you had figured that JSR $FFD2 was used to write a character and even why that worked. Those ROMs were full of great techniques…
I get asked for recommendations a lot. Most of the time I have little to no data when asked to perform this sort of divination. But as it turns out I have this ready-to-go universal advice that works for me, so I'm able to give the same recommendation all the time even with no data! Handy, huh?
Here it is:
Load as little as you can. Run as little as you can. Use as little memory as you can. Take the fewest dependencies you can. Create the simplest UI that you can. And measure it often. Don’t just measure your one success metric, measure all the consumption and make it as small as you can. Consider all the important targets, including small devices and large servers. Nothing else will do, nor is anything else necessary.
When you understand your costs you will be making solid choices.
Never use non-specific data you think you remember to justify anything.
Last, and most important of all, never take the advice of some smart-ass performance expert like me when you could get a good solid measurement instead. :)
I hesitate to bring this up but there’s no sense hiding it. For the last year I’ve been the development lead for the Internet Explorer Performance Team. We’ve done some really cool things I’m super proud of. Like recently this gem.
However, things are usually not so rosy in my world. There are lots of changes on any given day and many of them can affect things adversely. So it’s super important to be watchful. And it’s also important to be watchful in a smart way. But even if you do all that, things still can get through the cracks. This post is about one such incident. But first a little background.
The IE performance team, amazingly, measures IE performance. We do it in all of the development branches of IE (there are many) and in the master branch. Additionally, our central performance team does measurements in literally every windows development branch, but they only do a subset of the IE benchmarks everywhere.
Most *new* IE performance problems, “regressions”, originate in the branches where IE developers are working -- that is only natural. However, despite the fact that upwards of 99% of changes not related to IE have no impact on IE benchmark it’s still the case that there is the 1% or so. It doesn’t make sense to burden every other team with also checking every IE benchmark when 99% of the time they are unaffected, so my team also is responsible for finding cases where we have inadvertently been affected by something in Windows generally, and getting to the bottom of it.
We have a large suite of benchmarks to catch these things and on balance they do a very good job. But they are inherently imperfect. You just can’t cover everything in a modest set of benchmarks because there’s not enough hours in the day to run every test you might want in every configuration. So sometimes things get through. And also you can’t really count on your benchmarks to give you a sense of what customers are really seeing (even if those customers are still just Microsoft employees trying out the latest build). The diversity of configurations you can test in your lab is nothing like the diversity in the wild. To say nothing of the diversity of usage scenarios. So while suites help a lot, you need to supplement this with telemetry.
Why is this so important? Well, now consider this recent example. A fairly modest mistake was made in semaphore handling that was largely unrelated to the usual things the IE engine has to do like formatting, layout etc. And in fact none of the usual benchmarks detected anything abnormal. So when it came time to commit our code to the master build we were happy to do so. However shortly thereafter, we started getting reports of badness.
Unfortunately, even in an engineer-rich environment like Microsoft you get a lot of reports that are not very actionable. Very early on two people reported that recent builds of IE were much slower and very near unusable. However neither of these people had any data that could be used to track the problem and in both cases whatever it was went away by the next day. This does not mean that there were not experiencing a real problem, but this is the kind of report that leaves you with little to do but cry about it. Not especially helpful for improving things, though possibly therapeutic. A few more days went on and others started making similar reports. Most of the bug reports were just as un-actionable as the first – I won’t say unhelpful because at least we knew something very bad had happened. Thankfully though, now some very tenacious people were gathering traces for us and some of our automation was starting to notice the problem and so automated reports with more logging were starting to arrive. Things were looking up.
Shortly after the first traces arrived we had a positive diagnosis. The problem was fairly simple and had even been specifically logged previously, and even fixed, but not correctly. Now those of us looking at this data and hearing various reports we also had the job of making sure that the reports had the same root cause – we couldn’t rule out the possibility that there were 2, or 3, or more different things people were experiencing that were resulting in this behavior.
Again lucky for us, this particular bug was not insidious. While it was regrettable it was comparatively easy to fix and soon a fix was available internally. But the whole experience made me think about how we might have caught the problem sooner. And also how we could have determined that it really was only the one problem more affirmatively, with less manual examination of every trace we could get our hands on.
Here the telemetry really shines, and I wish we had looked at it more closely before we committed our changes because I suspect even with a small N there would have been enough data to show that something was wrong.
The following chart shows the distribution of navigation times in the wild (uncontrolled activity) for our internal users before the bug was introduced (in blue) and after the bug (in orange). In this view the badness just leaps off the page – there is a giant spike at the 10s mark. And indeed the nature of this bug is that it causes delays of exactly 5s during download of images, due to hitting a 5s timeout. The data shows clearly that we never see just one such delay. And it also shows that, disregarding this one (bad) problem, the new/orange version of IE was actually superior to the previous/blue version.
What’s more, the telemetry was able to provide us with two other interesting facts – all builds are reporting a spike around 125ms or so – why? And why did the old build have a spike around 10s also? You can be sure we’ll be looking into that soon.
Lesson for me: look at this telemetry sooner, it’s proving to be a useful tool in judging build quality.
Good lesson for everyone.
*Thanks to Amy Hartwig for providing me with the charts on a silver platter.
I wrote some career advice a few weeks ago now, those few points are largely distilled from talks I’ve given here at Microsoft over the years and those are in turn distilled from the various mentoring sessions I’ve given over the course of my career.
There are two memes in particular that I like to impart to every mentee on their very first session with me, both of which I acquired rather than invented. I find them to be very useful for starting discussions. If you read the previous article you’ll readily see how they are infused into the short list of points. Both are very simple as you’ll see below.
I learned the first one from a close friend who learned it from Prof. Seviora of U of Waterloo. Prof. Seviora had (has?) a habit of injecting some of his real world experience into his classes at the end of a lecture if there was time and this is one of those little tidbits.
“VIP” stands for Visibility, Image, and Performance. So if you like the rule is Success = VIP. I have yet to figure out if it’s S=V*I*P or S = V*(I+P) or something else entirely but it doesn’t seem to matter in application. I explain it something like this:
I have never met anyone who thought they could be a great success whilst performing poorly. So pretty much everybody gets the P. Likewise, most people who think about their career at all discover that visibility is important. This basically leaves one letter left to discuss and that’s the “I” – Image.
Thankfully for me Image does not refer to the clothing you wear (though dress for success isn’t without its place) it’s more like your “brand.” When people think about you, what do they think of? Reliability? Integrity? Productivity? Determination? Motivation? Leadership? Do they think you work on hard and important problems? That’s the crux of it. The problems may be management problems, technical problems, recruiting problems, or any other domain that is valuable.
If people know who you are (V), know that you work on hard and important problems (I) and that you it well (P) it’s pretty darn hard to be a failure.
Cultivating your brand is super helpful for a number of reasons, not the least of which is that when your brand is well known then many people are able to represent you and your interests when you are not there. Some of my favorite work stories involve people correctly channeling me at important meetings which I could not attend. Sometimes several at the same time. This is not bad for your career :)
I learned this one from a dear friend (Tara Prakriya) who learned it from a mentor during her time at Merck. I explain this one pretty much the way it was taught to me with a few extra bits.
The essence of it is that you do not want your career to look like a tower (at this point I draw a long skinny rectangle) – you want it to look more like a pyramid (at this point I draw an equilateral triangle). Why? Well, consider that tower looking career – you’ve advanced along nice and fast but do you think your management really wants to put another metaphorical brick on the tippy top? Looks like the thing might fall over. And getting bricks down near the bottom is now a lot harder because “you’re too important.” This is not a good place to be.
On the other hand that triangle takes a lot longer to build up but it’s nice and sturdy and provides a great base of knowledge upon which to build future success.
Now is your career going to look like a perfect triangle? I doubt it. You’re more likely to have something rougher with maybe a main peak and a side peak and some bumps here and there but that’s ok. All of those gaps and mini-peaks represent growth opportunities. People will see those and think “here’s a chance to flesh out that area” and they will give you those jobs with confidence. Even going wider is easier because management knows you enjoy the breadth play as well.
In this model when you have choices to make you consider what “brick” the opportunity will allow you to acquire and where it will land. Do you want a brick on top? Or do you want to fill in a gap? Or start something new?
But wait… I started with pyramid and I drew a triangle. At this point I fix the picture to show the 3rd dimension.
A triangle falls right over, you need a nice solid base. That third dimension, the other faces of the pyramid if you will, is everything else you bring to your life. Success with your family, your church, your theater, whatever it is that makes you a well-rounded person. Maybe you have so many faces your pyramid is actually a cone :)
It is the overall combination of life experience you bring every day to your job and your other endeavors that will allow you to succeed. You may think that working exclusively on the career face is the path to success but that’s an illusion – all these things build on each other. That’s why it’s so important to not compromise – you need to be working on all the things that are important to you in some unified plan to achieve the best result for yourself. And you need to bring all your assets to bear in all your endeavors.
When you find that balance, then you’ll be the best you can be.
And this inevitably leads me to start quoting Kung Fu Panda, at which point all seriousness is lost, but hopefully, retention is enhanced.