no$sns opengl and directdraw

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
no$sns opengl and directdraw
by on (#106920)
Could some people help me on testing the OpenGL and DirectDraw video output in no$sns? I've been trying to use hardware accelleration to zoom the SNES game screen, but I've never figured out if it did work.

The OpenGL stuff was tested on a few computers, and it did work, but it was even slower than without hardware accelleration, in so far it appears to be useless and I could as well remove it. Unless it does work on certain computers - does anybody have a PC where the "OpenGL" mode is working faster than the other no$sns video output modes?

The DirectDraw originally stuff didn't work with newer DirectDraw structures, but that should have been fixed in current no$sns version - did anybody ever try that (or could so now)? Does it work, and if yes, how fast is it?

For testing, please use this settings:
- Emulation Speed = Unlimited MHz Disaster, 100% (=as fast as possible, but no frameskip)
- Performance Indicator = Show Timings and Frameskips (found in the "Debug" setup page)
- Use the mouse to resize the no$sns game window as big as possible.
- Run a SNES game (best something with constant CPU load, like a static title screen)

And then try the different "Stretching Type" options (StretchDIBits, OpenGL, and DirectDraw) and write down the emulation speed (shown in the caption).

For OpenGL and DirectDraw you can also pich two "Stretching Modes" (Resize/edgy, and Resample/blurry). Resample may look nicer, but for OpenGL it seems to be even slower (judging from current tests). And for DirectDraw I've no clue if and how it's working.

There's also a fourth "Stretching" Type option called "Software", but it's only half implemented (works only on 100% zoom steps, whilst the window can be resized in 50% steps, so it's a bit hard to tell if current window size is matching).
Re: no$sns opengl and directdraw
by on (#106967)
Game used Zero The Kamikaze Squirrel, Options screen
Desktop resolution 1680 x 1050 @ 32bits
2.4GHz P4
Ati Radeon 9250
Win98SE

StretchDIBits - 210% 108/108
When image is blocked speed increases drastically

OpenGL - 16% 8/8
constantly, even when image is not visible. Blurry or Edgy options made no difference

DirectDraw - ~580% 294/294
constantly, even when image is not visible. Image is always blurry regardless of the option.

Software - 154% 76/76
When image is blocked speed doubles
Re: no$sns opengl and directdraw
by on (#106972)
Many thanks for testing!

OpenGL with 8 fps doesn't seem to be optimal. Funny because my neighbor told me that I must use OpenGL and it's best and fastest thing on earth (when he was telling me, I did feel pretty stupid because, really, I didn't knew that), and he claimed that it works under linux and even under windows - well, maybe on newer windows versions - but I'd almost doubt it.

Very nice to know that DirectDraw is running that fast! I've blown up some weeks on it, but I've been rather pessimistic, thinking that it might run unstable on most computers (which, maybe it does so, but at least I do now know that it does work on one computer).
The blurring is a bit odd under DirectDraw, one can enable/disable it vertically (which should also work in no$sns, theoretically). But horizontally the DirectDraw specs just say that the hardware can do whatever it wants to do ;-)

Don't know why the Software mode is slower than StretchDIBits, it should have been same speed, or even a bit faster. Well, I'll keep the Software mode in there anyways (and maybe improve it a bit, especially adding blur support to it, too).

If some more people could test it, please go ahead! I'd be really interested in how fast & stable it's working. If it looks fine, then I'd be planning to add it to other no$xxx emulators, too.
Re: no$sns opengl and directdraw
by on (#106973)
OpenGL is fast when you already have the textures in the video card and are drawing 3D models with them. It's potentially not so fast for emulation because you're continuously uploading software-composited textures to the video card, which means you're converting each frame's texture to the video card's own texture format. This may involve texel color format conversions, or it may involve what the Dreamcast homebrew community has referred to as "swizzling" (a form of texel coordinate bit interleaving to make adjacent texel lookups for bilinear filtering more cache-friendly).

As for StretchDIBits, I imagine that most video card drivers that include 2D acceleration accelerate it.
Re: no$sns opengl and directdraw
by on (#106977)
What you could try is to use pixel buffer objects (PBOs) for the texture update. It will allow you to asynchronously upload the texture data to VRAM - i.e. glTexSubImage2d should return immediatly in this case so that your rendering thread can continue working instead of being blocked by the texture upload (the GPU driver will take care of doing the upload in the background).

During setup you'd do something like this:

Code:
glGenBuffers(1, &pbo);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo);


And during rendering:

Code:
glBufferData(GL_PIXEL_UNPACK_BUFFER, bufferSize, NULL, GL_STREAM_DRAW);
buf = glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
mySuperDuperFastMemCpy(buf, snesFrameBuffer, bufferSize);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, texFormat, pixFormat, 0);


If you really wanted to, you could create 2 PBOs and their corresponding data stores during setup, and manage them yourself instead of letting the driver handle it by constantly requesting new data stores. Letting the driver handle it is easier, and there really shouldn't be any performance difference, but it's up to you.
Re: no$sns opengl and directdraw
by on (#106978)
Game used: Skipp and Friends demoscene-style scroller
Desktop resolution 1440x900 @ 32bpp
Core 2 duo P8400 (2.26GHz)
Intel GM45
For all the below, the fraction given was always same/same.

Wine 1.5.6
Software & StretchDIBits - 685-750%
OpenGL - claims 160-170%, but nothing's rendered (grey)
DirectDraw - something's forcing page flipping, so 100% always, but nothing's rendered (black) Uses about 44% of one core of CPU, so probably would be 225%?
Blurry/edgy made no difference for any of the above.

WinXP
Software & StretchDIBits - 165% if any portion of the window is covered no matter how small, but 60% if not (what??)
OpenGL - 10%-12% (edgy) 8-9% (blurry)
DirectDraw - 615-625%
When set to edgy, it's blurry. When set to blurry, it gives an error every frame ("blt_err 88760136")
Re: no$sns opengl and directdraw
by on (#106980)
tepples wrote:
OpenGL is fast when you already have the textures in the video card and are drawing 3D models with them. It's potentially not so fast for emulation because you're continuously uploading software-composited textures to the video card, which means you're converting each frame's texture to the video card's own texture format. This may involve texel color format conversions, or it may involve what the Dreamcast homebrew community has referred to as "swizzling" (a form of texel coordinate bit interleaving to make adjacent texel lookups for bilinear filtering more cache-friendly).


I would strongly suggest that if you can get a performance increase from Direct3D, then you should be able to get something equivalent through OpenGL, though unfortunately the required knowledge may be slightly obscure (most documentation focuses on function more than performance).

As tepples mentions, your bottleneck is likely the process you are using to upload your texture to the GPU each frame. For starters if you are using glTexImage2D, stop doing that and use glTexSubImage2D (the former recreates the texture object each time). Swizzinlg is probably not your issue unless you're explicitly creating a swizzled texture type (all GPUs can handle linear textures, swizzling is an optional cache optimization which is useless for textures that aren't rotated onscreen). You may get a little bit of juice out of trying BGRA vs RGBA formats, not sure, but I would guess not really. There are more modern ways of getting the texture across as well, the Pixel Buffer Object API is better designed for this, and even has an asynchronous transfer option. If you're using Windows' ancient OpenGL 1.1 implementation, you will need to look into how OpenGL extensions work to get at the Pixel Buffer Object stuff.
Re: no$sns opengl and directdraw
by on (#106985)
Yes, I've been using glTexImage2D, didn't knew about glTexSubImage2D. I'll try that in next version, thanks mic_ and rainwarrior!
For glBindXxxx functions, I was thinking that this would be for "static" textures (updated once), not for the kind of "streaming" textures (updated in every frame). But looks as if I did misunderstood that.

---

Thanks for the Wine and WinXP tests! Looks as if both have their odd effects, but do at least they do work fast in this or that mode.

> 165% if any portion of the window is covered no matter how small, but 60% if not (what??)
Strange, no idea what is causing that. It doesn't happen under win98.

> When set to blurry, it gives an error every frame
Okay, then I'd assume that the hardware doesn't support blurring... but, as you say that the edgy mode is blurry... then... now that is confusing. No idea what is happening there, and if there's a way to properly detect the hardware capabilities in that case.
But, reminds me that I did add some DirectDraw/OpenGL detection stuff, which can be viewed via "Window" --> "Internal Stats", can you copy/paste to displayed text? Maybe it reveals something useful.
Re: no$sns opengl and directdraw
by on (#106987)
Also please provide a link to the build you want tested. I would have tried it, but I didn't want to go searching.
Re: no$sns opengl and directdraw
by on (#107003)
nocash wrote:
But, reminds me that I did add some DirectDraw/OpenGL detection stuff, which can be viewed via "Window" --> "Internal Stats", can you copy/paste to displayed text? Maybe it reveals something useful.
Done, attached. But that crashes under wine.
Re: no$sns opengl and directdraw
by on (#113942)
I've uploaded four no$sns beta versions with OpenGL related changes:

v1.4a --> same as v1.4, plus SwapBuffers, removed per-frame context recreation
v1.4b --> same as v1.4a, plus using glTexSubImage2D
v1.4c --> same as v1.4b, plus using PBO extension (GL_PIXEL_UNPACK_BUFFER_ARB)
v1.4d --> same as v1.4c, plus using PFD_DOUBLEBUFFER

would be great if 3-4 people could try it on different computers.
Please PM me for info where to download the beta versions.
(They aren't official releases, not found on the webpage).

---

Up to now, I've tried the beta's with the Microsoft and MESA opengl software drivers. Basically, the beta's should be working, but I have no clue if and how far hardware acceleration is improving the frame rate. Only hope that it can't be worse than MESA (which is reaching one-half-frame-per-second in maximized view).

For the changes: SwapBuffers shouldn't be needed (for single-buffered contexts), but MESA refuses to display anything without that command; maybe the gray screen under Wine was caused by that, too.
Removing the per-frame-context-recreation (I really don't know WHY I was recreating that everytime) should make things a bit faster (around 15% on my computer).
Using glTexSubImage2d instead of glTexImage2d might also give some speedup, main difference is that it allows to update only the required 256x224 pixel area, instead the whole 256x256 texels (gives only 2% speedup for me, but maybe other opengl drivers have even more overload in glTexImage2d).
Using PBOs is forcing an extra-step: the bitmap is first copied from Main RAM to the PBO, and then copied from PBO to texture memory (basically that means more work, but the second step can be handled all at the GPU side without pausing the CPU; although - if it that is faster - then the driver should be automatically doing that even when not using PBOs. So... no idea if PBOs are improving anything).
And finally, using doublebuffering: That's a must-have for complex 3D scenes, but shouldn't be needed for displaying a single texture bitmap (as done in no$sns). On the other hand, maybe some 3D video cards don't even support singlebuffering. And, microsoft/mesa drivers are splitting the bitmap into two triangles, which looks ugly in animated scenes (especially visible scrolling scenes): there's some noticeable delay between drawing the separate triangles; especially with MESA which needs one second per triangle ;-)
Re: no$sns opengl and directdraw
by on (#114030)
Got some test results via PM from lidnariq...
lidnariq wrote:
Same machine as before. (1440x900 @ 32bpp; Core 2 duo P8400 (2.26GHz); Intel GM45)
Wine 1.4.1
NO$SNS 1.4A, Blurry or edgy: actually works! 248%
NO$SNS 1.4B, Blurry or edgy: nothing shows (black screen). Does draw once when exiting a modal dialog. (Blurry: claims 300%; not: claims 315%)
NO$SNS 1.4C and D, Blurry or edgy: dies with "glGenBuffersARB"

Wine 1.5.29:
1.4A: Same as wine1.4.1-NO$SNS1.4B behavior (blurry and edgy claim 190%)
1.4B: " " " " (blurry claims 200%; edgy claims 190%)
1.4C and D: dies with "glGenBuffersARB"

WinXP
All builds: Blurry: 10%; not: 15%
No discernible difference between builds.

Yeah, that's... frustating. For WinXP, I am no longer believing that ChoosePixelFormat does "automatically" select an accelerated format. I'll try to enumerate all formats via DescribePixelFormat, and add my own scoring system to select a reasonable format.
For Wine's unsupported "glGenBuffersARB" function, I am explicetly detecting that "GL_ARB_pixel_buffer_object extension" is supported (it is), but apparently newer programs are required to use the new "glGenBuffers" core function instead of the old "glGenBuffersARB" extension. Which, I can do that (use either one as far as present), but older programs that aren't aware of that situation can't do so (ie. the wine driver doesn't appear to attempt to be backwards compatible with older software).
And Wine's black screen in NO$SNS 1.4b (with SubImage)... I am rather clueless there... especially as the same problem seems to appear even in NO$SNS 1.4a (without SubImage) in Wine 1.5.29. And no idea how this problem can get affected by "exiting a modal dialog" :-/
Re: no$sns opengl and directdraw
by on (#114080)
Other test results, from Dwedit,

Dwedit wrote:
These were tested on my Core 2 duo Windows XP machine with ATI graphics (Radeon X1400):
Main menu screen of Secret of Mana (hi-res screen)

DirectDraw: 958% (905% when maximized)
Software: 850% (107% when maximized)
StretchDibits: 433% (107% when maximized)
OpenGL A: 600% (530% when maximized)
OpenGL B: 765% (570% when maximized)
Version C and D gave a fatal error: "NO$SNS - FATAL: glGenBuffersARB"

Then I tested Super Mario World at Yoshi's house while the game was paused.

DirectDraw: 925% (930% when maximized, actually went faster)
Software: 820% (105% when maximized)
StretchDIBits: 813% (107% when maximized)
OpenGL A: 723% (690% when maximized)
OpenGL B: 745% (717% when maximized)
Version C and D also failed on OpenGL mode.

Also, Resample mode fails on DirectDraw with a "blt_err 88760136" and the 'edgy' mode is actually blurry.

Also tested on a Windows 7 machine (sandy bridge Intel i5 2500k, video card is the Intel processor itself)

Secret of Mana menu screen:

directdraw: 1650% (1303% when maximized)
software: 1605% (1550% when maximized)
stretchdibits: 1582% (1550% when maximized)
opengl b: 1123% (505% when maximized)
opengl a: 850% (442% when maximized)

Super Mario World paused at Yoshi's house:

directdraw: 1535% (1205% when maximized)
software: 1605% (1440% when maximized)
stretchdibits: 1605% (1430% when maximized)
opengl b: 1087% (535% when maximized)
opengl a: 950% (485% when maximized)

Same errors with openGL C and D, and directdraw in blurry mode.

Summary:
DirectDraw is the undisputed speed king on Windows XP, but not so much on Windows 7. Software is really bad when resized on Windows XP, but only slightly slower on Windows 7. OpenGL B beats A, and C and D don't work at all.


That looks less frustating. Main issue on that PC seems to be replacing the missing "glXyzARB" functions by "glXyz".
Re: no$sns opengl and directdraw
by on (#114121)
Tears, tears, tears. I've run into the next (double-)problem: When using SetPixelFormat with PFD_DOUBLEBUFFER, the opengl driver is allocating memory (for the second buffer), but when trying to resize the window, it doesn't resize the buffer accordingly. For my 256x224 pixel window, it seems to allocate a buffer for 1256x280 pixels. Resizing works fine, unless the height gets bigger than 280pix, which causes a black border to appear at the bottom of the window (border_height=window_height-280). And another black border appears at the right side when exceeding 1256pix width.
Found only two webpages mentioning the problem:
http://stackoverflow.com/questions/1377 ... al-sdl-set
http://www.opengl.org/discussion_boards ... ing-window
The first page claims that it is a viewport related "newbie mistake" (apparently due to misunderstanding the problem), the other page says that it is "a bug in opengl32.dll (the software rendering library)" which cannot be fixed (the 'good' news is that it sounds as if it might happen only with microsoft's generic driver).

Trying to delete & recreate the opengl context didn't work (still left me with the 1256x280 pix buffer limit). One thing that DOES work (and which is used in no$sns v1.4a/b/c/d) is completely unloading & reloading the opengl DLL during resizing. But... that brings up the next problem: When dragging the window border, and shaking the mouse back'n'forth: This will successfully resize the window (and reload the DLL) some hundred times, and then... it's crashing: Telling that the Explorer has crashed (ironically, this is always leaving no$sns running intact, instead, it's randomly terminating another process, such like: your browser, a text editor, or the dos prompt).
I am suspecting that FreeLibrary does only request to unload the DLL, but without waiting for completely unloading it. So that trying LoadLibrary after FreeLibrary is unstable, is that possible?
Hmmm, either that, or I am having some resource leak, that sums up when reloading the DLL too often.
EDIT: Yes, it is some resource leak, stealing some megabytes each time when resizing the window, and also when opening/closing the game window. Only, completely closing no$sns (game+debug window) is deallocing the leaked memory. At least I do now know what to search for.
All I wanted to do is to Display-a-stupid-Bitmap, and ended up with 1600 lines code for OpenGL, plus 1800 lines for DirectDraw.
Re: no$sns opengl and directdraw
by on (#114125)
Quote:
plus 1800 lines for DirectDraw.

My suggestion: throw away DD support. It's been deprecated for years and it doesn't seem likely that anyone would have a machine with DirectX7 but no OpenGL. You'll save yourself a lot of pointless work by focusing on one API.
Re: no$sns opengl and directdraw
by on (#114130)
Throwing away the "undisputed speed king"? As Dwedit has called it.
Gladly, but OpenGL and DirectDraw are both unreliable, I am happy if at least one of them works on different computers.
Unless maybe on newer hardware. But, after running into the non-backwards-compatible glBindBufferARB problem... I would expect more such problems to arise in future, when hardware is getting too new.
And, the main idea behind using hardware acceleration was to display bitmaps at reasonable speed on old computers. Or even not-so-old ones: I've found two more pages mentioning the PFD_DOUBLEBUFFER problem,
http://www.opengl.org/discussion_boards ... OpenGL-1-1
http://support.microsoft.com/kb/q272222
it seems to be an official bug at least up to WinXP; occuring on PCs without proper video drivers installed, including virtual machines accessed via remote desktop. The official "solution" seems to be to use PFD_DRAW_TO_BITMAP instead of PFD_DRAW_TO_WINDOW, but I am unsure what to do then... use windows API to transfer the bitmap to VRAM? Well yeah, one could do that... only, the reason for using OpenGL was not to use the normal API functions.
Anyways, I will give up there. The black borders will look ugly when using the Generic driver, but then... that driver is so slow that it isn't really worth to fix its problems.
Adding my own PixelFormat scoring will hopefully avoid the Generic driver where possible (like on lidnariq's PC), and allowing glGenBuffers alternately to the 'ARB functions will hopefully solve forward-compatibilty problems.