Wednesday, March 24, 2010

The Little Stored Procedure That Could.

In this corner, we have the production server. A Windows 2003 R2 SP2 virtual machine with 2x3GHz processors, 3GB of RAM, and plenty of disk. It's running SQL Server 2005, build 3325. Residing on that server is a database with corresponding stored procedure that does a couple of 3 table join queries, (one of which is a derived table), UNIONs them, and spits out the results. The only difference between this server and the development server is about .75GB more RAM on the DEV box. PROD and DEV DBs are exact copies of one another. Apologies, but I can't post the procedure here. If anyone wants to see it, let me know via Twitter, and I can show it to you privately.

In the opposite corner, we have a Windows XP SP2 workstation, running a VB.Net application that calls said stored procedure. Or at least, tries to. There are two copies of the application on that machine. One that points to DEV, one that points to PROD. The connection info is in a config file, so this matters little, and we only used the PROD copy for testing in this particular case, just changing it's configuration file to point to the DEV DB where needed.

Here's how the testing went:
  • DBA runs the proc from SSMS on the DEV server over RDP = Pass.
  • DBA runs the proc from SSMS on the PROD server over RDP = Pass.
  • DBA runs the proc from SSMS from his workstation on the DEV DB = Pass.
  • DBA runs the proc from SSMS from his workstation on the PROD DB = Pass.
  • App ID runs the proc from the DBA workstation via SSMS on the DEV DB = Pass.
  • App ID runs the proc from the DBA workstation via SSMS on the PROD DB = Pass.
  • App ID runs the proc from the user workstation via ISQLW on the DEV DB = Pass.
  • App ID runs the proc from the user workstation via ISQLW on the PROD DB = Pass.
  • App ID runs the proc from the user workstation via the app, on the DEV DB = Pass.
  • App ID runs the proc from the user workstation via the app, on the PROD DB = BIG OL' HONKING FAILURE, BUDDY.
Obviously some sort of application issue, right? Well, one of the tables involved in the procedure contains about 4M rows. Not that big, especially considering there's a covering index on it for the particular data we're looking for. However, instead of selecting from there, when we first place the data in a table variable instead, the problem goes away.

I am at a loss to explain this. There's no blocking / locking apparent in the profiler trace. The procedure itself is about as tuned as we can get it. Next steps are going to be to clear the cache, and equalize the RAM between DEV and PROD. After that is going to be a call to PSS.

Any help or pointers at all would be appreciated. Thanks for your time.

Update: The actual error is that the proc does what it's supposed to do, then simply stops executing. It does this every time it's run from the app. After about 16 minutes, it times out.

2 comments:

sqlblindman said...

I don't get how it "does what its supposed to do" and then stops executing. Isn't a sproc supposed to stop executing when it is done doing what it is supposed to do?

David M Maxwell said...

He he... good point. I could have explained that much better.

What happens is, as I run a profiler trace of RPC and SP, I can see the initial RPC:Started, then all the expected SP:StmtStarted and SP:StmtCompleted. After the last expected SP:StmtCompleted, the trace simply stops. 16 minutes later - I get the RPC:Completed, with error. And this only happens when called from the application.