Application Error 1005 Loading SSE DLL

While I was testing my latest release (1.2) of Vibratium I encountered a puzzling error. It took me a while to track this one down. Hopefully someone else can learn from this.

TL;DR: solution is at the end.

Problem:

Vibratium worked fine on my dev box, but on my laptop as soon as I added a Render Object the application would crash.

In the Application event log I got Application Error 1005:

“Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program Vibratium because of this error.”

Research:

So according to the error it looks like the native C++ DLL SSELibrary.dll wasn’t found.

This was interesting because in this release I’d migrated to Visual Studio 2015, statically linked the VC Runtime file, then added the SSELibrary.dll dependency as a resource to Vibratium.  Upon launch I extract the DLL to the user’s Temp directory (%USERPROFILE%\AppData\Local\Temp\Vibratium) then call LoadLibrary on it. The end result is that I now have one file to deploy, much like how SysInternal’s Process Explorer.  Process Explorer has multiple embedded files to deal with different operating systems and bitness – you only ever see the one file unless you look in the launch directory while running it. Or look in Process Explorer and notice that it launches an extracted ProcExp64.exe. (… which further extracts and loads a driver that it uses to query the OS. Run ProcMon against ProcExp some time.)

Vibratium is a .Net app and the first thing I do in my main WinForm Load event is make a call to SSELibrary’s IsPresent() function for exactly this reason. The entirety of IsPresent is to return true – anything else indicates a DLL load error. The 1005 error happened when I attempt to invoke a different function later on, so it SEEMED that the DLL was loading but the error says it’s not.

I launched Process Explorer to verify that the DLL was loaded into memory. It’s not showing in the handle list on the laptop. So I switch to the desktop and it’s not showing there either. Running “handle SSELibrary.dll” returns nothing as well.

So I launched again with ProcMon running. I can see the file being created, written, loaded, then closed. And this is repeated once in the same second. For whatever reason it would appear that calling LoadLibrary on a native DLL doesn’t maintain an open handle or lock on the DLL. Perhaps it gets loaded and read each time it’s needed? I wonder what happens when the working set gets trimmed too low. Interesting.

So it looks like the Application event log error may not be terribly accurate. Surprise! The file is actually being loaded and run.  But there’s still some hard crash.  Running under a remote debuggers shows the SEH error is actually 0x80040005 – so an HRESULT wrapped around an error 5 – access denied.

Anyways, all of this was working on my workstation – the DLL was extracting, loading, and executing properly. It could have something to do with the laptop being Win10 and my dev box is Win8. So I tested it and it ran just fine in a Win10 VM.  It would appear to not be an OS issue.

It might be a processor support issue. So I add the call to Supports_AVX1(). This passes on my dev box – an Intel i5, my Win10 VM – the host has an AMD FX-8000, and fails on the laptop – Core 2 Duo. Okay, we have a relevant difference.

But the function it’s blowing up on doesn’t actually use any AVX intrinsics, it’s entirely SSE 4.1 and lower, which all three processors support.

So I edit the function and return before any intrinsics are called – we’re talking pure C++ code modifying in-memory structs that are aligned in a for loop.

It still blows up. If I return before actually doing anything significant it succeeds.

There is something in the generated code that is causing this.

At this point I remembered that when I was integrating the updates to the SSELibrary project to the Vibratium solution I WinDiff’ed the projects and copied things and modified line by line. SSELibrary implements AVX1 functions but Vibratium doesn’t use any of them. One of the things you need to do to use the intrinsics is set the /arch flag properly. During the integration with SSELibrary I changed it to AVX, and everything worked just fine since my dev box supports AVX1.

After rebuilding with /arch:SSE2 everything worked as expected.

Okay, what difference does the flag make?

I haven’t dug into it too deeply, but I reviewed the PE COFF 8.3 format and a DLL (an “image” file) doesn’t have any flags set for SSE or AVX or any of the other SIMD instruction sets. I used HexOut and WinDiff to verify that they have (nearly) identical headers. So that leaves the generated code (.text section).

But the function that blew up didn’t even use a single SIMD instruction! This threw me off, but then I remembered auto-vectorization.

http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/12/auto-vectorizer-in-visual-studio-11-overview.aspx

“The Visual Studio 2012 auto-vectorizer tries to make loops in your code run faster by automatically vectorizing your code – that’s to say, using the SSE instructions available in all current mainline Intel and AMD chips. Auto-vectorization is on by-default. You don’t need to request this speedup. You don’t need to throw a compiler switch. You don’t need to set environment variables or registry entries. You don’t need to change your C++ code. You don’t need to insert #pragmas. The compiler just goes ahead and does it. It all comes for free.”

It’s on by default in all “mainline” chips, but I’d gone and specifically set it to allow AVX which is NOT available on all chips. I suspect that told the compiler to upgrade SSE instructions with AVX ones in that function and that’s why IsPresent() succeeded and the one with the for loop didn’t. I’d have to disassemble it to be sure.  (AVX is 256 bits wide versus SSE2+ is 128 bits wide – definitely an upgrade.)   I think when execution hit the AVX instruction the CPU threw an error on the unknown instruction.  The error could definitely have been better.

The compiler warning “ignoring unknown option /arch:AVX” during debug builds kept nagging me as well. I expect all auto-vectorization is off in debug builds.

Solution:

Change your /arch flag to SSE2.

  • Make sure you do this for all non-debug versions.
  • Open the properties of your Visual C++ project
  • Configuration Properties / C/C++ / Code Generation
  • Enable Enhanced Instruction Set => Not Set

If you change the setting to Streaming SIMD Extensions 2 (/arch:SSE2) then you’ll get a compiler warning D9002: “ignoring unknown option ‘/arch:SSE'” which is essentially the same as Not Set.

If you download and use my SSE Library in your own project make sure you adjust your /arch flag accordingly.

It seems strange than an unrecognized/invalid instruction would end with a thrown “access denied,” but there it is.

I hope that helps.
Cheers
-george