Virtual 6510 - Technical Overview

Introduction:

    Ever wanted to upgrade your C64's speed as often as PC technology changes? Well this project aims to be just that - a software /hardware combination that would give 8-bit Commodore machines a "virtual" CPU upgrade. The real CPU (a 6510 or 6502 chip) is not replaced by electronics, but rather by a software emulation of it run on a modern PC. The PC provides accelerated performance because of its modern design and faster speed, whilst the real Commodore hardware provides compatibility as it is used for sound, video and device I/O.

    To make all this possible a hardware adaptor and cable is needed, in addition to the emulation software.

    Work on this project has now stopped due to a general lack of interest in it by the C64 online community. The information below may of interest to those who ware interested in a review of the challenges involved.

Overview of Concept:

    The PC runs the V6510 software which emulates the functionality of the Commodore's CPU. Whilst the emulation software is able to do the CPU processing, it has to access the real Commodore hardware for anything to be visible on the screen or for commands to be received from the keyboard (for example). To do this the software directs most write operations through to the real Commodore hardware. Sometimes the PC also needs to read from the real Commodore. These reads are only needed when a value changes on the C64 hardware that is not otherwise detectable by the PC (for example a key is pressed or the joystick moves).

    Information is exchanged between the PC and the Commodore via a cable and hardware adaptor. Other accelerators for the Commodore do essentially the same thing, but instead of a 65C02 or 65816 processor your PC will run a software model, or "virtual" processor, and that uses its own copy of the memory map. With the appropriate hardware adaptor, the same software could be adapted to accelerate all 8-bit CBM computer types.

    The benefit of using PC software is that speed and emulation capability can be developed independently of the availability to buy a faster 65xxx processor. Simply put, as PC technology is upgraded, so will the speed of the "virtual" CPU upgrade for your Commodore. The challenge, as will become clear, is maximising the speed of the virtual processor whilst handling all the I/O which needs to be done to keep the real hardware in-sync with the software emulation.

How does Virtual6510 work?

    The heart of V6510 is the software engine which emulates the 6510 CPU. This software engine reads code and processes it. Most operations are conducted in the software engine, which includes the Commodore's memory map and ROMs. Depending upon configuration settings, operations which write to memory or I/O are written to both the virtual model and the real hardware.

    The operation of writing to real hardware needs a high degree of optimisation in order to not slow the operation of the software engine. If the hardware interface could achieve the write operation in 1us (1 clock cycle) then the process will be faster than a real C64, and as fast as the SuperCPU. Unfortunately, without a 32-bit synchronised I/O card on the PC this I/O speed is not possible.

    V6510 proposes to use two priority levels when writing to the hardware. Firstly, all memory writes are cached to the virtual map and a byte-by-byte system is proposed for writing the data to the real hardware, skipping unchanged locations. This method ensures the memory gets synchronised, the rate of which depends upon the optimisation mode configured. Secondly, all writes to Commodore I/O chips (such as the VIC, SID, etc) take priority over the memory sync bytes.

    Similar to I/O writes, I/O reads have the highest level priority. This means that a read from an I/O location which has the potential to change will be read from the hardware in preference to the memory sync operations. Some I/O locations do not change on their own and so data from these is read from the virtual map (eg a sprite location will not change under hardware control, it only changes by software).

    The actual hardware read and write operations are synchronised by the C64's readiness for data (thus taking into account the VIC-II stolen cycles). Data however is sent in an economical fashion using tokens, meaning that most operations need only one or two PC I/O calls. In between token requests, the PC runs at full speed to process code. When a token has been received by the C64, it executes one of its microcode routine, typically consisting of 3-4 instructions. Because micro-code is used rather than DMA, each C64 read or write operation takes longer than it would on either a real C64 or SuperCPU. The efficiency of the micro-code minimises the delay. With the use of appropriate optimisation modes (modes similar to the SuperCPU as well as smart modes) I/O delays are minimised, and memory synchroning is fast enough to minimise most graphics aliasing effects. C64 memory will only read back into the virtual map after a disk LOAD operation.

    There is one other situation where the C64's CPU is relied upon, and that is to run timing critical 1MHz disk I/O code. JiffyDOS routines in particular are critical to a couple micro-seconds. Certain kernal calls must be made inside the real machine rather than by simulation on the virtual engine. This is similar to what a SuperCPU does as it also has to slow down (it has a patched kernal) before doing disk I/O.

Proposed hardware adaptor:

Click for enlarged view
    The block diagram shows the proposed functionality of the hardware adaptor. The implementation is achieved with several EPROMs and a latch. One of the EPROMs is configured to act as a low-speed Programmable Logic Array (PLA). The adaptor is expected to be most neatly configured as a cartridge; potentially minimising the effort of building the project by recycling an existing CBM cartridge design. Similar adaptors could be configured for Commodore's other than the C64.

System Issues which had to be Overcome:

Click for enlarged view
    This project requires a circuit to adapt and synchronise the signals from the PC to the C64 and vis-versa. The circuit is more complex than a simple X1541 cable, but is much simpler to build than a SuperCPU (especially so if a specific CBM cartridge is used as the donor!)

    The need for the circuit comes from the fact that I/O communications on the PC bus can be severely limited in both speed and bus width. The limitation means that the PC (even the very fast ones) cannot directly write and read into the Commodore's bus. Additionally, there is no standard I/O port available on all PCs which has sufficient control lines.

    The LPT port used for the X-cables is a robust connection that is standard on most PCs. Unfortunately, the LPT connection is also effectively an 8-bit data bus with 4-bit control and 5-bit status lines.

    Click for enlarged view In SPP or Bi-Dir modes the maximum speed the LPT port can reach is approximately 1MHz, but on most machines it is much slower. This limitation is largely independent of processor speed and exists for historical reasons. The LPT port could be operated in an enhanced mode such as EPP, but this only provides a marginal increase in data throughput (max. 2.4MB/s) whilst it mandates a significant increase in hardware complexity between the PC and C64.

    Click for enlarged view On the other hand, developing a PC-Card to handle the data exchange in hardware is complex and costly, and it is also unlikely users will want to build their own. Additionally, the I/O limitation described above for the LPT port also applies to other conventional cards on the PC-bus and so only the complex 32-bit I/O buses such as VLB or PCI would improve speeds. With a 32-bit bus it may be possible to Direct Memory Access (DMA) the Commodore hardware and also minimise the software overhead in accessing the card, but these 32-bit bus modes however need special I/O chips to comply with the operating standard and thus further increase cost /complexity. Also, only the C64/C128 has DMA capability and thus the project would in that case be restricted to just those two machines.

    Clearly 8-bit PC I/O is the limitation, and intelligence rather than power, is necessary if a budget solution is to be found.

Synchronous vs Asynchronous Emulation

    Certain elements of the simulation are synchronous whilst others are not.

    Whilst exchanges between the PC and C64 are synchronised (to the Commodore's readiness), the software CPU emulation is not. The use of an asynchronous CPU model improves processing speed (or virtual MHz).

    Consider a JMP instruction, all that needed is the operation PC=new_value. On the other hand an instruction such as LDA #$00 needs A=0, Z=0, N=0, etc. The JMP instruction is emulated faster by the virtual engine than the LDA (the opposite of what happens on the real CPU).

    Wherever possible, the virtual engine would assume the state of some inputs and continue. This is particularly beneficial in polling loops where sampling the real input at a faster speed than was possible on the original hardware is not advantageous (eg, when polling for a joystick click, reading the port at 50MHz is no more advantageous than reading it at 1MHz).

Advanced Options for the Future

    The emulation software could be continually developed to make use of more features available on modern PC hardware. For example, a 16Mb REU is not easily obtainable in real hardware, but in software such a device is configurable within the RAM of a typical modern PC. Likewise, serial modems and other expansions could be emulated.

Feasibility Testing and PC Survey

Click for enlarged view
    Several PC configurations have been already assessed and more data is required, especially for faster PC types. The data gathered so far suggests that even though 486-class machines can achieve 5MHz virtual operation, the need for memory synchronisation and I/O access is likely to eliminate much of this power and result in no significant speed acceleration.

    High-end PC processors in the 500-600MHz range can achieve upwards of 50MHz when operating on simple instructions, but more typically achieve upwards of 15MHz with the benchmarking code and allowances made for memory synchronisation and I/O access.

    The project is now entering the public feasibility phase. If you would like to take part in the survey send an email to v6510@lycos.com for a link to the software test suite. The test suite will check the speed of emulation possible on your machine. It would be appreciated if you could email back the result file with some feedback and comments so I can assess whether this project is worth pursuing in the future.