In 1997, I first dumped the ROM from a DEC VT320 video terminal and decided to try to decode it. The processor in these terminals is a member of Intel’s MCS-51 family of microcontrollers, in this case a Siemens SAB8031A, with a 64 KiB ROM.
I wrote a disassembler and simulator (both long since lost) and started trying to decode the ROM, having never used an 8051 before. I got the disassembler to only decode the parts that I needed, following the call path so as not to get confused by what I assumed would be large chunks of data in there. I still have a printout from February 1998 of the state of annotated disassembly at the time, tiny lettering on 112 lines-per-page printout, running to 67 pages. I had to give up soon after coaxing the simulated terminal through the power-on self-test, thinking that there were probably subtleties of the 8051 instruction execution that I had got wrong, because I didn’t know how to stimulate it into doing anything interesting.
Nearly 20 years later, I’ve come across that printout and decided that, in the wake of my annotated disassembly of the VT100 video terminal, I’d give my favourite terminal another go.
The VT100 uses an 8080, and for that project I quite easily found a disassembler and 8080 simulator core written in C that I could reuse to get the terminal’s code running. Finding 8051 tools has proven slightly more frustrating.
I started with the disassembler, which was simple enough. Jeffery L. Post’s D52 disassembler (now maintained by Eric Smith) has a control file which lets you mark sections as being code or data, so you don’t spend time looking at incorrectly interpreted chunks. It has an analyse function to produce an initial control file. In practice, this tends to follow control flow and opts out of disassembling sections that it can’t see a clear call to. However, its inability to decode jump tables means that it significantly under-disassembles if you use this option. I don’t regret using it, although it took many days of effort wading through the disassembly, marking ‘binary’ sections in the control file back as ‘code.’
The other difficulty with interpreting the disassembly and finding code sections has nothing to do with D52. The terminal software uses a lot of jump tables, and DEC decided to place all the 16-bit addresses in these in little-endian format, despite the 8051 being a big-endian processor. Obviously marking these tables as data words with the ‘dw’ format wasn’t going to work, so I decided that the next tool I was going to need, an assembler, would have to be a macro assembler.
I asked on a mailing list for recommendations for FOSS macro assemblers and simulators for the 8051. The following assemblers were mentioned:
- Ken Stauffer’s AS31
- W. W. Heinz’s ASEM-51
- Alfred Arnold’s Macroassembler AS (got a lot of votes)
- San Bergmans’ SB-Assembler 3
AS31 doesn’t have a macro facility, which rules it out for large projects like this one, but I tried it anyway. As it was written in 1990, the dialect of C is so old that gcc threw many warnings and I feared that it wouldn’t produce an executable. It uses non-standard directives for data (’.byte’ instead of ‘db’) and I couldn’t get the assembly to finish even after pre-processing these, because the parser has an ambiguity between binary and hexadecimal constants. In theory it will accept hexadecimal constants suffixed by ‘h’, with the usual caveat that hex numbers starting with ‘a’ to ‘f’ should be prefixed by ‘0’. However, AS31 accepts binary literals prefixed by ‘0b’, so AS31 choked on my hexadecimal literal ‘0bch.’ It seems to be very strict on single quotes versus double quotes too. At that point I was tired of pre-processing my source code and tried the next tool.
I would normally have ruled out ASEM-51 as it has no source available. It is free however, and cross-platform, so I downloaded the Linux binary and pointed it at my source. There was just one issue, in that I’d labelled the start of the code as ‘reset,’ when that is a built-in symbol. After deleting that, it assembled all 40 KiB of source to an Intel HEX file. Running that through hex2bin.py gave me an image identical to the original ROM.
Ten minutes later, I had read the manual sufficiently far to create the macro I needed to get the jump tables in the correct format:
dwl macro address
db low(address),high(address)
endm
And I stopped there: ASEM-51 works for me. I will try Macroassembler AS at some point, but for now I’m cleaning and annotating the source, safe in the knowledge that I can rebuild the binary after every change, as protection against fat-fingering anything.
Simulators are rather more thin on the ground. These names came up:
gSim51 was an almost immediate bust. When you’ve navigated the ad-laden hell of Sourceforge (why is it even still a thing?) and downloaded the code, it won’t build. There are functions missing. The source has crossed-out code in comments as ideas were changed on the fly. This is some college assignment that has been dumped on the web. Avoid.
emu8051 is simple and rather cute. It doesn’t have full debugging facilities such as breakpoints, but it presents an ncurses-based text interface that shows the current line, registers, flags and one of the areas of memory that you can change at will. You can set the PC and then single-step or run the code. I tried setting the PC and stepping through some of the routines that I’d identified where aspects puzzled me, and it helped out.
However, it wasn’t long before it crashed out with a segfault, with me jumping around the code. I then ran it over another routine that I was having trouble decoding, and the results were not at all what I expected. This routine used some BCD arithmetic and the numbers in registers didn’t match what I’d calculated by hand. I took a look at the source and realised that the ‘DA A’ instruction was implemented incorrectly, and it was being fed incorrect flags because the ‘ADD’ instruction was also wrong. Uh oh. A look at the open issues with pull requests on Github suggests that this isn’t being actively used, and I don’t want to spend my time chasing down segfaults, so I’ll look elsewhere.
Where else? At this point, I’m seriously considering writing a small 8051 core myself, because I have the schematics for the VT320 and I know exactly how the ports, DUART, EEPROM, video interface, timers and interrupts are wired up.
I downloaded EdSim51 but I haven’t found a way to run it on Fedora 43 without it complaining that I don’t have the X11 DISPLAY environment variable set. Setting it does no good, possibly because of the shift to Wayland. At this point it’s a toss-up as to whether I hate Java or Wayland more.