Properly simulate older machines?

kozaki · 2016-08-15 16:24:04

Hi. How do you set-up your VM to emulate old boxes --sticking reasonably close at their features (that is their capabilities)?

Boxes today are a bit far off circa 2004 (non-sse2/non-pae therefore non Flash/Windows/Chrome/friends) CPUs.

Hearing from dudes with a more thorough practice of virtualization would probably result in good more than hurts. No specialist here (for all I've read ten times more materials wouldn't hurt). Not sure if the set-up I use (below) does fit « well enough » for IRL, productivity scenarios, foss OS [1] comparison (and yeah, tweaking )

Trying to get a VM capabilities ⋜ (equal to or a bit less) the target cap'; present checklist:

computing capacities (eg Athlon XP 2800+-based) with DDR400 and hdd
I know than Qemu KVM tries to restrict the vcpu to the chosen "-cpu model", but it's unclear to me whether it succeeds (e.g. "-cpu pentium3" cache should be 250 KB and guest OS reads 2 MB)
IO capacities (big point here)
- IDE bus maxing out @ 133 MB/sec theoretically, how much does a standard SATA spinning HDD adds in term of efficiency, taking the 8MB cache and the virtual image overhead in account?
- The bus/memory subsystem is a whole other question: e.g. how does the VM takes advantage of the host's high speed DDR3/integrated controller?
2D performances in par with some 32-to-256 MB AGP GPU

I believe getting rid of sse2 stands big on the list. From https://lists.nongnu.org/archive/html/q … 00053.html:

In KVM and XEN modes, almost every CPU instruction is executed directly by the physical host CPU. Only parts of memory management, and a few system administration instructions and (on some CPUs) virtual real mode 16 bit (DOS) modes involve the emulation software.

So mathematical SSE2 operations should run at full hardware speed.
Cache preload SSE/MMX operations may have different performance than on the host.

Current host set-up - a 3.3ghz Core i3 [2]

cpulimit -l 60 \
qemu-system-i386 -cpu pentium3,enforce \
 -m 1G -soundhw ac97 -vga std -display gtk -enable-kvm -hda hdd.img

cpulimit tries to emulate the 2ghz by restricting sub-process to use 60% of one of the host's 3.3 ghz CPU threads.
Second to get an sse only cpu; also guest image stands on a 7200 Sata spinning HDD.

Which according to /proc/cpuinfo, inxi (and google chrome [3] ) gives

  $ inxi -f 
  CPU:       Single core Pentium III (Katmai) (-UP-) cache: 2048 KB clocked at 3292.518 MHz     
            CPU Flags: apic cmov cx8 de fpu fxsr hypervisor mca mce mmx msr mtrr pae pge pse pse36
            sep sse tsc vme x2apic

[1] presently Linux based; I can't wait to put a *BSD on test.
[2] Host has VT-x, no VT-d capabilities.
[3] Video on the screenshot reads: « Make an old PC faster with Google Chrome »...

So what's your ideas or set-up?

EDIT: links, more specific title and (hopefully) grammar.

Last edited by kozaki (2017-07-08 13:28:23)

Lone_Wolf · 2016-08-17 00:22:12

Kozaki,

could you explain why you want to do that ?

I'm not sure VM techniques are suitable for this, gaming console emulators seem to come closer to simulating real hardware.

kozaki · 2016-08-17 23:21:57

Lone_wolf, sure:

Me and a few neighbors have started a « digital diy » workshop at the local diy house. Now we have enough hardware components that we tested / repaired to 1. Set-up some crafted oss systems on them; 2. Put them on sale at prices that'd put da gangsta out of business

I now about the « Get the old hardware and test on it »; heyy, it's me who's testing the hardware then the OSes and apps on the 1997-2008 (and coming) boxes we're refurbishing.
Actually it's the only way to get the old machines & systems seen -and possibly tested by colleagues and people that are passing by.

Now this takes time, the diy house has limited opening schedules, and I unfortunately have no room for more hardware back at home. Therefore the great interest of having some of the testing done on a virtual environment.

E.g. testing a few apps and libraries behavior in a sse-only and low memory environment goes way faster in a VM than on the old boxes themselves. Same thing happens when comparing user scenario(s) between ten given OSes.

Last edited by kozaki (2017-07-08 13:38:09)

pypi · 2016-08-20 05:17:53

A quick google suggests that IO throttling is possible, although I haven't tried it (yet - I have an old box to experiment with):

https://www.google.co.nz/search?q=limit … e&ie=UTF-8

You seem to have figured out how to throttle the CPU usage; however, just throttling the QEMU process to 60% will not be enough as a modern CPU will be many times quicker than a CPU with an equivalent clock speed. You might want to run a CPU intensive benchmark on both to determine how much you should tune the throttle.

kozaki · 2016-08-20 11:00:54

@pypi looks interesting!
Wouldn't bootchart be a.quick and dirty way of auditing / comparing general IO caps?

As for the cpu simulation test I know: I need to test how far does the architecture/flags simulation plus throttling goes in respect with the original cpu/FSB caps.

pypi · 2016-10-17 21:24:32

I'm unsure - I know that bootup on modern systems is I/O bound, but I'm not sure about on older systems. I'm using (as a rough estimate) a combination of boot time and application startup times, since that seems to be most directly affected by I/O limits. I've just tested I/O throttling, and it seems to work pretty well - but not as big a difference as I was expecting. I can simulate the 3MB/s disk access speed, and by reducing the iops below 100 I can slow some things down to an "about right" level. However, the latency is still basically nonexistent - I'm not sure how much of difference that makes, but older machines always seem to have more "lag".

I know that the old machine I'm looking at uses PIO instead of DMA for disk accesses, which might make a difference. Another thing to consider...

CPU throttling in combination with reducing the I/O speed seemed to be more effective in terms of making things more realistically slow than either just one or the other - what do you think?

mich41 · 2016-10-18 08:08:40

kozaki wrote:

As for the cpu simulation test I know: I need to test how far does the architecture/flags simulation plus throttling goes in respect with the original cpu/FSB caps.

Just about as far as you have already seen - you can change the name and disable some instruction set extensions. You can also reduce frequency with CPUFreq. Still, the CPU will do much more instructions per cycle than P3 would (SSE1 code will easily run 2-4 times faster, for example), won't disable its caches and won't slow down its memory controller.

You can get a bit farther by downclocking everything, increasing memory timings to max and so on. But it still won't be the real thing.

For disk, it should be possible to limit I/O throughput with cgroups. But I don't think there is a way to simulate longer seek times. However, you can easily get some actual old disk and connect it through SATA-IDE or USB3-IDE bridge (USB2 may be a bit slow).

And when it comes to disk caching, internal disk cache is nothing. You have few gigs of RAM which the host OS uses to cache files, including VM disk image. You should probably use cache=none or cache=directsync.

Last edited by mich41 (2016-10-18 08:46:16)

kozaki · 2017-07-08 14:25:25

Thank you guys. Project here is going on. Am back in the software part for a month after months working to enhance our infra', testing lots of hardware components and, more than everything, adjusting the workshop workflow since (wo)man power tripled ':)

1. CPU emulation
I've been cheated by QEMU/libvirt options' names. E.g. « sse2=off » (or feature « disable » in libvirt). What it does is it *hides/masks* the instruction to the guest OS, but it's still there! Only applications that explicitly check for e.g. 'sse2' CPUflag will not run; all other software will run and use it.
Example: If we start the guest with `qemu -cpu pentium2` or `-cpu athlon,sse2=off` Firefox 54 (requires sse2) and mpv (requires sse) run without issue; chromium on the other hand only starts its UI, can't render no page. Actually [1],

"-cpu pentium3" (which claims to not support SSE2 in cpuid) ... does not guarantee that there are really no SSE2 instructions encountered. That can only be reliably verified with full software CPU emulation or with a real non-SSE2 CPU.)

mich41 wrote:

You can also reduce frequency with CPUFreq. Still, the CPU will do much more instructions per cycle than P3 would (SSE1 code will easily run 2-4 times faster, for example), won't disable its caches and won't slow down its memory controller.

cpulimit does allow to limit the instructions per cycle; qemu's `-cpu pentium3 -smp [cpus=]n[,cores=n][,threads=]` to limit the number of cpu threads the guest can access to. Up to last week I thought it *was* limiting access to the physical CPU cache as it pretends:

(host) ~$ qemu-system-i386 -cpu pentium,check (...)
(guest) ~$ inxi -f                                                                         
  CPU:       Single core Pentium MMX (-UP-) cache: 0 KB speed: 3292 MHz (max)      
             CPU Flags: apic cx8 de fpu hypervisor mce mmx msr pse tsc vme x2apic

But since host's CPU instructions are usable while being masked (when « disabled ») I can have no more confidence in the guest outputs. QEMU has added -cpu l3-cache parameter but I found no way to set the first two levels [2]

I/O stack emulation
@pypi it's great you can simulate the 3MB/s disk access speed, and reduce the iops below 100.

pypi wrote:

However, the latency is still basically nonexistent - I'm not sure how much of difference that makes, but older machines always seem to have more "lag".
I know that the old machine I'm looking at uses PIO instead of DMA for disk accesses, which might make a difference. Another thing to consider...

mich41 wrote:

For disk, it should be possible to limit I/O throughput with cgroups. But I don't think there is a way to simulate longer seek times. However, you can easily get some actual old disk and connect it through SATA-IDE or USB3-IDE bridge (USB2 may be a bit slow).
And when it comes to disk caching, internal disk cache is nothing. You have few gigs of RAM which the host OS uses to cache files, including VM disk image. You should probably use cache=none or cache=directsync.

qemu's 'cache=none' makes a noticeable difference in performance; from qemu man page:

none : The host page cache is avoided entirely. This will attempt to do disk IO directly to the guest's memory.
directsync : The host page cache is avoided while only sending write notifications to the guest when the data has been flushed to the disk

What do you mean with « internal disk caching is nothing »?
Use physical old disks for the VM is a nifty idea; I like it and will do some VM / physical machine comparison next week to see if we can get a better emulation of the oldies.

[1]: https://lists.nongnu.org/archive/html/q … 00053.html
[2]: http://techqa.info/sysadmin/question/29 … -kvm-guest

kozaki · 2017-07-24 09:54:09

Guy from Linaro gave a feedback on the QEMU ml. Says it might be an emulator bug if it serves applications that require cpu instructions absent in the chosen cpu. Explains further:

(As you have discovered, you can't completely disable
the SSE instructions when using KVM -- this is because
the host CPU hardware does not support trapping to
the hypervisor (or otherwise faulting) on those
instructions.)

Maybe I shoud fill a bug? His further advices (review QEMU's code extensively and static analysis of the code scanning for any part that'd require da cpu instructions) are out of my competencies atm.

EDIT: Forgot to say that launching the Qemu vm without kvm still allow the applications requiring sse2 to run (with some patience).

Last edited by kozaki (2017-07-24 09:56:46)

Arch Linux

#1 2016-08-15 16:24:04

Properly simulate older machines?

#2 2016-08-17 00:22:12

Re: Properly simulate older machines?

#3 2016-08-17 23:21:57

Re: Properly simulate older machines?

#4 2016-08-20 05:17:53

Re: Properly simulate older machines?

#5 2016-08-20 11:00:54

Re: Properly simulate older machines?

#6 2016-10-17 21:24:32

Re: Properly simulate older machines?

#7 2016-10-18 08:08:40

Re: Properly simulate older machines?

#8 2017-07-08 14:25:25

Re: Properly simulate older machines?

#9 2017-07-24 09:54:09

Re: Properly simulate older machines?

Board footer