dd routines interface

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
dd routines interface
by on (#13737)
i just want to know if there are better methods to implement ddraw interface to the emu. I have a secondary buffer on on board graphics card mem (of course). i mean having it in PC memory is a pain in the performance.
Im using 32 bit per pixel and i have a func like : DDPutPixel(DWORD x, DWORD y, DWORD Color)

What im currently doing is:

1- Lock the surface
3 - I put the pixels while NES is rendering pixels (sl 0 - 239)
4 - Then i unlock and blt the surface to the primary -> go back to 1

One thing to mention is that as screen witdh in NES is 256 this "DDPutPixel()" makes a:

Code:
DDPutPixel(DWORD x, DOWRD y, DWORD Color)
{

     *((LPDWORD)lpSurfacePtr + (Y << 8) + X) = Color;

}


In other words: << 8 to avoid multiplication. That's the only thing i can optimize :( .

Anyone having a ddraw implementation different from that?
Re: dd routines interface
by on (#13739)
This response isn't specific to emulation, since I haven't written an emulator. Anyway.. A putpixel routine is a nice thing to have if you're learning graphics programming and you want to play around with it.. but it's basically never efficient in practice. When you've got to redraw the whole surface using elements as small as pixels, it's generally better to begin at a known location (usually the beginning -- 0) and step forward as you write through the whole surface. You're going to write every pixel anyway. View the surface as a 1-dimensional memory buffer, and write numbers into the current location as you step to get the colors you want. When it gets to the end of the row, the next address begins on the next row.. so the vertical step occurs naturally. Sometimes this can even allow you to gain extra efficiency by writing more than one pixel in a single write. So.. Do it this way, and you don't need to keep recalculating your position from x and y.

Sometimes your position is implicit in the way you write the routine.. or sometimes you have a variable for the pointer to the current memory address.. or sometimes it's a combination of each. Whatever you do, avoid unnecessary calculations.

by on (#13743)
Your code is flawed in that you assume the pitch of the surface to be even with the width -- which is often not the case.

For plotting individual pixels like that, multiplication is UNAVOIDABLE. However individual pixels plotting (and thus, multiplication) can be avoided. Note to plot a pixel, the correct way to do it is:

Code:
void* out = ((BYTE*)surface_pointer) + (Y * surface_pitch) + (X * bytes_per_pixel);

*((DWORD*)out) = pixel;  // assuming 32-bit


Note that the surface width is used nowhere in the calculations. You MUST use the pitch to determine where each scanline starts. There's no way around it. Failure to do so will cause terrible effects for many people -- if it's working now, it's just a coincidence.


Anyway -- for something like an NES emulator where the entire surface can be written to in "like-a-book" type fashion -- you don't need this kind of pixel plotting, you can just adjust the surface pointer as you render pixels (as augnober suggested in his post):

vars to keep:
Code:
BYTE* video_out;   // output pointer
int  vid_pitch_add;  // number of bytes of padding between end of one scanline and start of the next


When frame starts (assumes you will always render 256 pixels for every scanline):
Code:
video_out = ddraw_surface_pointer;
vid_pitch_add = surface_pitch - (256 * bytes_per_pixel);


When you render a pixel (assuming 32-bit):
Code:
*((DWORD*)video_out) = pixel;
video_out += 4;  // move pointer to point to next pixel


At the end of a scanline:
Code:
video_out += vid_pitch_add;  // skip over padding to start of next scanline

by on (#13748)
You can obtain a pointer to the DirectDraw surface that is linear no matter what video mode you're in. You still have to adjust for pitch, however. Check out how I do it in my emulator (look at the DrawFrame function): http://nemulator.com/websvn/filedetails.php?repname=nemulator&path=%2Fnemulator%2FGraphics.cpp&rev=0&sc=0

Basically, the PPU writes palette values into a 256x240 buffer. When it's time to draw, I iterate over the buffer and surface pointer writing the correct color values to the surface. When I reach the end of the line, I increment the surface pointer to adjust for pitch.

edit: I wasn't accounting for surface pitch and modified the code. Thanks, Disch, for the pitch explaination.

by on (#13751)
Yeah you have the same problem. You're just assuming that scanlines are tightly packed within the surface -- which often times they're not (Though I guess it can be relatively common with surfaces that are a power-of-2 width -- either way that behavior should definately not be relied on).

Check this page -- this was the best link I could find on MSDN since DDraw isn't really supported any more. about 80% down the page there's a brief section titled "Width vs. pitch" which gives the general idea.

by on (#13752)
Cool -- thanks for the link... that picture is worth a thousand words. I did modify the code to account for this (I think it works, but pitch is always 0 for me, so I can't really test it). I haven't looked at this code in a long time and it's kind of a mess. Needs to account for different bit depths, etc.

Anyway, I've been running Linux on my PC for a few months now and miss playing Mega Man :) I started porting nemulator a couple of days ago and am using SDL, so I guess I'll be addressing these issues soon.

by on (#13753)
You're just getting lucky that ignoring the pitch is working for you. The reason is because most video cards use the pitch to round textures (surfaces) up to powers of two, and 256 just happens to be 2^8. It's just a minor video card alignment optimization. Still, you should account for it anyway because it's good practice, and you just never know these days. 224 multiplications a second are the least of your problems in emulating an NES, anyway.

Quote:
*((DWORD*)video_out) = pixel;
video_out += 4; // move pointer to point to next pixel


Why use a byte* if you always access video_out as a dword? Use :

Code:
DWORD *video_out;
  ...
  *video_out++ = pixel;


Or if you're going to be using both 16-bit and 32-bit rendering for god knows what reason :

Code:
*(DWORD*)video_out++ = pixel;


No speed hit / loss either way, but less lines == better :D

By the way, you guys really integrate the emu with the host OS, hm? DirectDraw blitting code immediately below the NES palette lookup. That must make porting (even to a different Windows graphics API ala D3D/OGL) a real PITA :/

Quote:
(I think it works, but pitch is always 0 for me, so I can't really test it)


Allocate a DDraw surface of 305x305 instead. It should still work if you did things right (and you properly set your source rect when blitting the surface to the backbuffer / screen).

by on (#13756)
byuu wrote:
Why use a byte* if you always access video_out as a dword?


In my emu I do it for two reasons:

1) So I can have that 'vid_pitch_add' variable which is the number of bytes to add between scanlines.

2) So I can easily switch between 16 and 32 bit video modes with something like:

Code:
video_out += bytes_per_pixel;


where bytes_per_pixel can be 2 or 4 depending on the mode.


As far as less lines = better -- that's why God created macros ;D


Quote:
Or if you're going to be using both 16-bit and 32-bit rendering for god knows what reason


User configurability I guess is one reason -- although the main reason is because some graphics filters take an input format of a certain type. 2xSai or something I think only takes 16-bit and Hq2x only takes 32-bit input -- or something like that... I might have them backwards.

Plus I just like to avoid making my code having to assume that I'm always running under a certain bit depth.

by on (#13759)
Well, with DD and D3D, along with every video card made in the past eight years or so, they support automatic conversion of bit depths, so even if the user is running at 16bpp or 32bpp, you can blit a 16bpp buffer to the screen. Why does this matter? You halve the video data being transferred from the system RAM to the video card. Huge speedup, even with something as intensive as my emulator. That speedup would be even more significant for NES emulators.

As far as (pitch - screen_width), if you're always using 16-bit pixels, just right shift that value by one when you first retrieve it, then you can add that halved value to your word pointer.

Anyway, yeah. If you support 16 and 32bpp depths, no reason to remove the code that's there. Carry on ;)

Quote:
As far as less lines = better -- that's why God created macros ;D


Now if only god would create good c++ macros XD

Something like :

Code:
macro opw_type1(name, math) {
  void op_%{name}w() {
    rd.l = op_read();
    rd.h = op_read();
    regs.a.w %{math} rd.w;
    flags_%{name}_w();
  }
}

opw_type1(lda, "=");
opw_type1(adc, "+=");
opw_type1(sbc, "-=");


Mmmmm.... being able to embed syntax and not requiring \ after every line.

by on (#13761)
byuu wrote:
Well, with DD and D3D, along with every video card made in the past eight years or so, they support automatic conversion of bit depths, so even if the user is running at 16bpp or 32bpp, you can blit a 16bpp buffer to the screen. Why does this matter? You halve the video data being transferred from the system RAM to the video card.

But if one filter can take only 16-bit data and another filter can take only 32-bit data, then your PlayChoice PPU emulator[1] still needs to be able to generate pixels in more than one format.

Quote:
Now if only god would create good c++ macros XD

If you want Common Lisp, you know where to find it ;-)

Quote:
Something like :

Code:
macro opw_type1(name, math) {
  void op_%{name}w() {
    rd.l = op_read();
    rd.h = op_read();
    regs.a.w %{math} rd.w;
    flags_%{name}_w();
  }
}

I think you might be able to shoehorn that into the C++ template system.

[1] A Famicom or NES generates a composite signal. A Famicom Titler generates an S-video signal, encoding the sum and difference of each pixel's high and low levels on separate wire pairs. Only a PlayChoice or Vs. system generates an RGB signal similar to that generated by the vast majority of PC based NES emulators.

by on (#13763)
The Famicom Titler has a RGB PPU, there is an external NTSC encoder like most game consoles.

by on (#13765)
If your filter is 32bpp-only, eg HQ2x, then you rewrite it to be 16-bit :D

Ask blargg, he has a really highly optimized c++ version of the HQ2x filter that outputs at 16bpp. Or get it from bsnes/src/snes/video/filter_hq2x.cpp.

Anyway, I don't think templates will do what I want. I know I can't insert expressions into functions, and I don't even think I can build larger labels from arguments to the templates, eg :
void op_<template argument>_w()
would only work with #defines.

Perhaps a nasty #define / template hybrid would do the trick...

by on (#13768)
thanks again for the pitch info and the code, i think maybe i didnt have problems cos i always used nvidia cards (at least since i started the emu).