Windows Vista Tips

Windows Vista Tips > Newsgroups > Windows Server > Clustering > Unexpexted Cluster Switch (due error 5 from clusterlog)

Reply
Thread Tools Display Modes

Unexpexted Cluster Switch (due error 5 from clusterlog)

 
 
Matthias
Guest
Posts: n/a

 
      05-16-2008
Hello all,
yesterday one of our clustersystems do an unexpexted clusterswitch.

Systeminformation:

HW: ProLiant DL585 G1 / 2x AMD Opteron 2,2 GHz / 16 GB RAM
OS: Microsoft Windows Server 2003 Enterprise x64 Edition
OS Version: 5.2.3790 Service Pack 2 Build 3790

HP ProLiant Support Pack 7.90

Atached to a SAN via FC

Main Software: SAP CRM 5.0 SP15 / on MS SQL Server 2005
Support Software: DataProdector / McAfee (Enterp. 8.0.0 Patch 15) / SNARE
3.0.0

MSCS-Configuration:

Userlan (Teaming)
Serverlan ( NO-Team)
PrivatLAN (crossover)

Clustergoup / MSDTC-Group / SAP-Group / SQL-Group


__________________________________________________ _________-

The Clusterlog:

0000098c.00000a64::2008/05/15-15:16:43.912 INFO [DM] DmpGetSnapShotCb:
DmpGetDatabase returned 0x00000000
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsGetTempFileName
Q:\MSCS\, chkpt, 8011 => Q:\MSCS\chk1F4B.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] DmpGetSnapshotCb:
Checkpoint file name=Q:\MSCS\chk1F4B.tmp Seq#=8011
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsMoveFileEx
Q:\MSCS\chk619F.tmp=>Q:\MSCS\chk1F4B.tmp
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] DmpGetSnapShotCb:
Failed to move the temp file to checkpoint file,
TempFileName=Q:\MSCS\chk619F.tmp, ChkPtFileName=Q:\MSCS\chk1F4B.tmp,
Error=0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\chk619F.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogCheckPoint: Callback
failed to return a checkpoint
0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogpReset:: Callback
failed to return a checkpoint, error=5
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Entry
LogFile=0x02ad7df0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogFlush :
pLog=0x02ad7df0 writing the 1024 bytes for active page at offset 0x00000400
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] WriteFile 99c (....)
1024, status 0 (0=>0)
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsFlushBuffers 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsCloseHandle 99c,
status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Exit
returning success
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
Q:\MSCS\tqu619E.tmp, status 0
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogpReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogReset exit,
returning 0x00000005
0000098c.00000a64::2008/05/15-15:16:43.912 ERR [DM]DmpCheckpointTimerCb -
Failed to reset log, error=5
0000098c.00000a64::2008/05/15-15:16:44.005 ERR Cluster service suffered an
unexpected fatal error at line 2324 of source module
d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
00000f58.00000f5c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f58.00000f5c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f38.00000f3c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f38.00000f3c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000f18.00000f1c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000f18.00000f1c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
Shutdown = 0.
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
00000000
00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
00000b70.00000b74::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
notification.
00000b70.00000b74::2008/05/15-15:16:45.004 INFO SAP Resource <SAP CPR 00
Instance>: ResourceControl request.
00000f18.00000f34::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f38.00000f54::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000f58.00000f74::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
down.
00000b70.00000f08::2008/05/15-15:16:45.035 INFO [RM] NotifyChanges shutting
down.
00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:
[DiskArb] CompletionRoutine, status 0.
00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:

There are also Errors in the Eventlog:

Event Type: Error
Event Source: ClusSvc
Event Category: Log Mgr
Event ID: 1016
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Description:
Cluster service failed to obtain a checkpoint from the server cluster
database for log file Q:\MSCS\tqu619E.tmp.

Next:

Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 15.05.2008
Time: 17:16:43
User: N/A
Computer: NODE1
Description:
Cluster service suffered an unexpected fatal error at line 2324 of source
module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.

alot of:

Event Type: Warning
Event Source: Ftdisk
Event Category: Disk
Event ID: 57
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
Description:
The system failed to flush data to the transaction log. Corruption may occur.

And:

Event Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7031
Date: 15.05.2008
Time: 17:16:45
User: N/A
Computer: NODE1
Description:
The Cluster Service service terminated unexpectedly. It has done this 1
time(s). The following corrective action will be taken in 60000
milliseconds: Restart the service.

I found the KB http://support.microsoft.com/kb/321531/en-us but I can not
belive that our virusscanner is the reason because we EXCLUDE all recommented
Drives and Files ( e.q Quorumdrive/ Databasedives / DatabasLOG-Drives/
SQL-Executables, Pagefile, C:\Windows\Cluster, ..\NTDS, ..ntfsr, ..SYSVOL,
*.chk, *.ebd, *.ldf, *.log, *.mdf, *.ndf, *.stm) from read and write scan.


Anyone has an idea ?


br, Matthias
____________________________________________
Matthias Schweifer - Austria

 
Reply With Quote
 
 
 
 
Jeff Hughes [MSFT]
Guest
Posts: n/a

 
      05-16-2008
Error 5 is an access denied and it occurred when we were checkpointing the
cluster registry to the quorum drive. Check and make sure the cluster
service account has both the 'backup files and directories' and 'restore
files and directories' user rights. Also, make sure your Antivirus is NOT
scanning the quorum. If it was scanning a quorum file at the time of a
checkpoint, that may explain the error 5.
--
Jeff Hughes, MCSE
Support Escalation Engineer
Microsoft Enterprise Platforms Support (Server Core/Cluster)


"Matthias" <> wrote in message
news:8D5AFD2A-3EBD-4297-B8A3-...
> Hello all,
> yesterday one of our clustersystems do an unexpexted clusterswitch.
>
> Systeminformation:
>
> HW: ProLiant DL585 G1 / 2x AMD Opteron 2,2 GHz / 16 GB RAM
> OS: Microsoft Windows Server 2003 Enterprise x64 Edition
> OS Version: 5.2.3790 Service Pack 2 Build 3790
>
> HP ProLiant Support Pack 7.90
>
> Atached to a SAN via FC
>
> Main Software: SAP CRM 5.0 SP15 / on MS SQL Server 2005
> Support Software: DataProdector / McAfee (Enterp. 8.0.0 Patch 15) / SNARE
> 3.0.0
>
> MSCS-Configuration:
>
> Userlan (Teaming)
> Serverlan ( NO-Team)
> PrivatLAN (crossover)
>
> Clustergoup / MSDTC-Group / SAP-Group / SQL-Group
>
>
> __________________________________________________ _________-
>
> The Clusterlog:
>
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [DM] DmpGetSnapShotCb:
> DmpGetDatabase returned 0x00000000
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsGetTempFileName
> Q:\MSCS\, chkpt, 8011 => Q:\MSCS\chk1F4B.tmp, status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] DmpGetSnapshotCb:
> Checkpoint file name=Q:\MSCS\chk1F4B.tmp Seq#=8011
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsMoveFileEx
> Q:\MSCS\chk619F.tmp=>Q:\MSCS\chk1F4B.tmp
> 0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] DmpGetSnapShotCb:
> Failed to move the temp file to checkpoint file,
> TempFileName=Q:\MSCS\chk619F.tmp, ChkPtFileName=Q:\MSCS\chk1F4B.tmp,
> Error=0x00000005
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
> Q:\MSCS\chk619F.tmp, status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogCheckPoint:
> Callback
> failed to return a checkpoint
> 0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogpReset:: Callback
> failed to return a checkpoint, error=5
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Entry
> LogFile=0x02ad7df0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogFlush :
> pLog=0x02ad7df0 writing the 1024 bytes for active page at offset
> 0x00000400
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] WriteFile 99c (....)
> 1024, status 0 (0=>0)
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsFlushBuffers 99c,
> status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsCloseHandle 99c,
> status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Exit
> returning success
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
> Q:\MSCS\tqu619E.tmp, status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogpReset exit,
> returning 0x00000005
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogReset exit,
> returning 0x00000005
> 0000098c.00000a64::2008/05/15-15:16:43.912 ERR [DM]DmpCheckpointTimerCb -
> Failed to reset log, error=5
> 0000098c.00000a64::2008/05/15-15:16:44.005 ERR Cluster service suffered
> an
> unexpected fatal error at line 2324 of source module
> d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
> 00000f58.00000f5c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status =
> 1,
> Shutdown = 0.
> 00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
> ""
> 00000f58.00000f5c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000f38.00000f3c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status =
> 1,
> Shutdown = 0.
> 00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
> ""
> 00000f38.00000f3c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000f18.00000f1c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status =
> 1,
> Shutdown = 0.
> 00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
> ""
> 00000f18.00000f1c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000b70.00000b74::2008/05/15-15:16:45.004 WARN [RM] Going away, Status =
> 1,
> Shutdown = 0.
> 00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1,
> ""
> 00000b70.00000b74::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000b70.00000b74::2008/05/15-15:16:45.004 INFO SAP Resource <SAP CPR 00
> Instance>: ResourceControl request.
> 00000f18.00000f34::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges
> shutting
> down.
> 00000f38.00000f54::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges
> shutting
> down.
> 00000f58.00000f74::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges
> shutting
> down.
> 00000b70.00000f08::2008/05/15-15:16:45.035 INFO [RM] NotifyChanges
> shutting
> down.
> 00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:
> [DiskArb] CompletionRoutine, status 0.
> 00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:
>
> There are also Errors in the Eventlog:
>
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Log Mgr
> Event ID: 1016
> Date: 15.05.2008
> Time: 17:16:43
> User: N/A
> Computer: NODE1
> Description:
> Cluster service failed to obtain a checkpoint from the server cluster
> database for log file Q:\MSCS\tqu619E.tmp.
>
> Next:
>
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Database Mgr
> Event ID: 1000
> Date: 15.05.2008
> Time: 17:16:43
> User: N/A
> Computer: NODE1
> Description:
> Cluster service suffered an unexpected fatal error at line 2324 of source
> module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
>
> alot of:
>
> Event Type: Warning
> Event Source: Ftdisk
> Event Category: Disk
> Event ID: 57
> Date: 15.05.2008
> Time: 17:16:45
> User: N/A
> Computer: NODE1
> Description:
> The system failed to flush data to the transaction log. Corruption may
> occur.
>
> And:
>
> Event Type: Error
> Event Source: Service Control Manager
> Event Category: None
> Event ID: 7031
> Date: 15.05.2008
> Time: 17:16:45
> User: N/A
> Computer: NODE1
> Description:
> The Cluster Service service terminated unexpectedly. It has done this 1
> time(s). The following corrective action will be taken in 60000
> milliseconds: Restart the service.
>
> I found the KB http://support.microsoft.com/kb/321531/en-us but I can not
> belive that our virusscanner is the reason because we EXCLUDE all
> recommented
> Drives and Files ( e.q Quorumdrive/ Databasedives / DatabasLOG-Drives/
> SQL-Executables, Pagefile, C:\Windows\Cluster, ..\NTDS, ..ntfsr, ..SYSVOL,
> *.chk, *.ebd, *.ldf, *.log, *.mdf, *.ndf, *.stm) from read and write scan.
>
>
> Anyone has an idea ?
>
>
> br, Matthias
> ____________________________________________
> Matthias Schweifer - Austria
>

 
Reply With Quote
 
Matthias
Guest
Posts: n/a

 
      05-16-2008
I am not the backup-administrator in our company, but as further information
I note that there was a FILE-System FULLBACKUP on both nodes ( with HP
DataProtector) ; also the physikal QuorumDisk was backuped....
Beginn : 17:15

Is that a possible reason for the erro 5 ?
Should we exclude the Quorumdisk from the backupset ?
(Is a Systemstatebackup sufficiently)

br, matthias

 
Reply With Quote
 
Jeff Hughes [MSFT]
Guest
Posts: n/a

 
      05-20-2008
Yes, if the quorum files were being backed up at the time, that's very
possible why you got an error 5. You do not need to backup the quorum and it
should be excluded from your scheduled backups. There's nothing there you'd
ever need to recover since all the quorum is used for is maintaining a copy
of the cluster database and any checkpointed registry keys, and you can
always recreate those files if needed.
--
Jeff Hughes, MCSE
Support Escalation Engineer
Microsoft Enterprise Platforms Support (Server Core/Cluster)


"Matthias" <> wrote in message
news:1D75AFAD-DA76-4A13-B925-...
> I am not the backup-administrator in our company, but as further
> information
> I note that there was a FILE-System FULLBACKUP on both nodes ( with HP
> DataProtector) ; also the physikal QuorumDisk was backuped....
> Beginn : 17:15
>
> Is that a possible reason for the erro 5 ?
> Should we exclude the Quorumdisk from the backupset ?
> (Is a Systemstatebackup sufficiently)
>
> br, matthias
>

 
Reply With Quote
 
steffen busch
Guest
Posts: n/a

 
      06-09-2008
Hello,
i got nearly the same messages as descriped above.
But my error code is 2

Event Type: Error
Event Source: ClusSvc
Event Category: Database Mgr
Event ID: 1000
Date: 06.06.2008
Time: 14:34:44
User: N/A
Computer: SVREHDWHCLN1
Description:
Cluster service suffered an unexpected fatal error at line 2236 of source module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 2.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

then i got several messages:

The system failed to flush data to the transaction log. Corruption may occur.

After that only this messages appear:

Cluster service is requesting a bus reset for device \Device\ClusDisk0.

Cluster Service did not start any more:

Server specific error code 5086

The cluster fails over properly and is running on the other node.

But the first node died

Any ideas??
I do not want to evict the node, or set up the machine new.

Config:

FSC Blade BX630
Win2k3 64 bit
Sql 2005 SP2

IBM SVC San FC Connected

Thanks for your help
 
Reply With Quote
 
John Toner [MVP]
Guest
Posts: n/a

 
      06-13-2008
Not enough info here to figure out the problem, but it looks like you might
have lost connectivity to your quorum disk.

Regards,
John

Visit my blog: http://msmvps.com/blogs/jtoner

<steffen busch> wrote in message news:...
> Hello,
> i got nearly the same messages as descriped above.
> But my error code is 2
>
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Database Mgr
> Event ID: 1000
> Date: 06.06.2008
> Time: 14:34:44
> User: N/A
> Computer: SVREHDWHCLN1
> Description:
> Cluster service suffered an unexpected fatal error at line 2236 of source

module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 2.
>
> For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.
>
> then i got several messages:
>
> The system failed to flush data to the transaction log. Corruption may

occur.
>
> After that only this messages appear:
>
> Cluster service is requesting a bus reset for device \Device\ClusDisk0.
>
> Cluster Service did not start any more:
>
> Server specific error code 5086
>
> The cluster fails over properly and is running on the other node.
>
> But the first node died
>
> Any ideas??
> I do not want to evict the node, or set up the machine new.
>
> Config:
>
> FSC Blade BX630
> Win2k3 64 bit
> Sql 2005 SP2
>
> IBM SVC San FC Connected
>
> Thanks for your help



 
Reply With Quote
 
praveen
Guest
Posts: n/a

 
      04-21-2011
Hi Jeff,

It will be very helpfull if you can provide a solution for one of the issue i am facing with the same Error 5.

I am facing this error in a Majority node cluster which has Exchange 2007 .

Cluster service could not write to a file (C:\DOCUME~1\XXX~1\LOCALS~1\Temp\CLS1348.tmp.

From cluster log,
00000de8.00002fa0::2011/03/17-02:45:19.673 WARN [CP] CppCheckpoint failed to get registry database SYSTEM\CurrentControlSet\Services\MSExchangeIS\ahe xclex1 to file C:\DOCUME~1\XXXAHC~1\LOCALS~1\Temp\CLS2D86.tmp error 5

00000de8.00002fa0::2011/03/17-02:45:19.673 WARN [CP] CppRegNotifyThread CppNotifyCheckpoint due to timer failed, reset the timer.



SO basically Error 5 comes for "Access denied" issue. we have Majority node set and I have ecxluded the C:\DOCUME~1\XXXAHC~1\LOCALS~1\Temp c:\Windows\Cluster from Antivirus scanning but still the error persists.

Kindly help to understand the possible cause of the occurence of Error 5 in this case.


> On Friday, May 16, 2008 7:07 AM Matthia wrote:


> Hello all,
> yesterday one of our clustersystems do an unexpexted clusterswitch.
>
> Systeminformation:
>
> HW: ProLiant DL585 G1 / 2x AMD Opteron 2,2 GHz / 16 GB RAM
> OS: Microsoft Windows Server 2003 Enterprise x64 Edition
> OS Version: 5.2.3790 Service Pack 2 Build 3790
>
> HP ProLiant Support Pack 7.90
>
> Atached to a SAN via FC
>
> Main Software: SAP CRM 5.0 SP15 / on MS SQL Server 2005
> Support Software: DataProdector / McAfee (Enterp. 8.0.0 Patch 15) / SNARE
> 3.0.0
>
> MSCS-Configuration:
>
> Userlan (Teaming)
> Serverlan ( NO-Team)
> PrivatLAN (crossover)
>
> Clustergoup / MSDTC-Group / SAP-Group / SQL-Group
>
>
> __________________________________________________ _________-
>
> The Clusterlog:
>
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [DM] DmpGetSnapShotCb:
> DmpGetDatabase returned 0x00000000
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsGetTempFileName
> Q:\MSCS\, chkpt, 8011 => Q:\MSCS\chk1F4B.tmp, status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] DmpGetSnapshotCb:
> Checkpoint file name=Q:\MSCS\chk1F4B.tmp Seq#=8011
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsMoveFileEx
> Q:\MSCS\chk619F.tmp=>Q:\MSCS\chk1F4B.tmp
> 0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] DmpGetSnapShotCb:
> Failed to move the temp file to checkpoint file,
> TempFileName=Q:\MSCS\chk619F.tmp, ChkPtFileName=Q:\MSCS\chk1F4B.tmp,
> Error=0x00000005
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
> Q:\MSCS\chk619F.tmp, status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogCheckPoint: Callback
> failed to return a checkpoint
> 0000098c.00000a64::2008/05/15-15:16:43.912 WARN [LM] LogpReset:: Callback
> failed to return a checkpoint, error=5
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Entry
> LogFile=0x02ad7df0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogFlush :
> pLog=0x02ad7df0 writing the 1024 bytes for active page at offset 0x00000400
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] WriteFile 99c (....)
> 1024, status 0 (0=>0)
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsFlushBuffers 99c,
> status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsCloseHandle 99c,
> status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogClose : Exit
> returning success
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [Qfs] QfsDeleteFile
> Q:\MSCS\tqu619E.tmp, status 0
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogpReset exit,
> returning 0x00000005
> 0000098c.00000a64::2008/05/15-15:16:43.912 INFO [LM] LogReset exit,
> returning 0x00000005
> 0000098c.00000a64::2008/05/15-15:16:43.912 ERR [DM]DmpCheckpointTimerCb -
> Failed to reset log, error=5
> 0000098c.00000a64::2008/05/15-15:16:44.005 ERR Cluster service suffered an
> unexpected fatal error at line 2324 of source module
> d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
> 00000f58.00000f5c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
> Shutdown = 0.
> 00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000f58.00000f5c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
> 00000f58.00000f5c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000f38.00000f3c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
> Shutdown = 0.
> 00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000f38.00000f3c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
> 00000f38.00000f3c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000f18.00000f1c::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
> Shutdown = 0.
> 00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000f18.00000f1c::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
> 00000f18.00000f1c::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000b70.00000b74::2008/05/15-15:16:45.004 WARN [RM] Going away, Status = 1,
> Shutdown = 0.
> 00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Active Resource =
> 00000000
> 00000b70.00000b74::2008/05/15-15:16:45.004 ERR [RM] Resource State is 1, ""
> 00000b70.00000b74::2008/05/15-15:16:45.004 INFO [RM] Posting shutdown
> notification.
> 00000b70.00000b74::2008/05/15-15:16:45.004 INFO SAP Resource <SAP CPR 00
> Instance>: ResourceControl request.
> 00000f18.00000f34::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
> down.
> 00000f38.00000f54::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
> down.
> 00000f58.00000f74::2008/05/15-15:16:45.019 INFO [RM] NotifyChanges shutting
> down.
> 00000b70.00000f08::2008/05/15-15:16:45.035 INFO [RM] NotifyChanges shutting
> down.
> 00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:
> [DiskArb] CompletionRoutine, status 0.
> 00000b70.00000f10::2008/05/15-15:16:45.050 INFO Physical Disk <Disk H:>:
>
> There are also Errors in the Eventlog:
>
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Log Mgr
> Event ID: 1016
> Date: 15.05.2008
> Time: 17:16:43
> User: N/A
> Computer: NODE1
> Description:
> Cluster service failed to obtain a checkpoint from the server cluster
> database for log file Q:\MSCS\tqu619E.tmp.
>
> Next:
>
> Event Type: Error
> Event Source: ClusSvc
> Event Category: Database Mgr
> Event ID: 1000
> Date: 15.05.2008
> Time: 17:16:43
> User: N/A
> Computer: NODE1
> Description:
> Cluster service suffered an unexpected fatal error at line 2324 of source
> module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 5.
>
> alot of:
>
> Event Type: Warning
> Event Source: Ftdisk
> Event Category: Disk
> Event ID: 57
> Date: 15.05.2008
> Time: 17:16:45
> User: N/A
> Computer: NODE1
> Description:
> The system failed to flush data to the transaction log. Corruption may occur.
>
> And:
>
> Event Type: Error
> Event Source: Service Control Manager
> Event Category: None
> Event ID: 7031
> Date: 15.05.2008
> Time: 17:16:45
> User: N/A
> Computer: NODE1
> Description:
> The Cluster Service service terminated unexpectedly. It has done this 1
> time(s). The following corrective action will be taken in 60000
> milliseconds: Restart the service.
>
> I found the KB http://support.microsoft.com/kb/321531/en-us but I can not
> belive that our virusscanner is the reason because we EXCLUDE all recommented
> Drives and Files ( e.q Quorumdrive/ Databasedives / DatabasLOG-Drives/
> SQL-Executables, Pagefile, C:\Windows\Cluster, ..\NTDS, ..ntfsr, ..SYSVOL,
> *.chk, *.ebd, *.ldf, *.log, *.mdf, *.ndf, *.stm) from read and write scan.
>
>
> Anyone has an idea ?
>
>
> br, Matthias
> ____________________________________________
> Matthias Schweifer - Austria



>> On Friday, May 16, 2008 8:27 AM Jeff Hughes [MSFT] wrote:


>> Error 5 is an access denied and it occurred when we were checkpointing the
>> cluster registry to the quorum drive. Check and make sure the cluster
>> service account has both the 'backup files and directories' and 'restore
>> files and directories' user rights. Also, make sure your Antivirus is NOT
>> scanning the quorum. If it was scanning a quorum file at the time of a
>> checkpoint, that may explain the error 5.
>> --
>> Jeff Hughes, MCSE
>> Support Escalation Engineer
>> Microsoft Enterprise Platforms Support (Server Core/Cluster)
>>
>>
>> "Matthias" <> wrote in message
>> news:8D5AFD2A-3EBD-4297-B8A3-...



>>> On Friday, May 16, 2008 8:37 AM Matthia wrote:


>>> I am not the backup-administrator in our company, but as further information
>>> I note that there was a FILE-System FULLBACKUP on both nodes ( with HP
>>> DataProtector) ; also the physikal QuorumDisk was backuped....
>>> Beginn : 17:15
>>>
>>> Is that a possible reason for the erro 5 ?
>>> Should we exclude the Quorumdisk from the backupset ?
>>> (Is a Systemstatebackup sufficiently)
>>>
>>> br, matthias



>>>> On Tuesday, May 20, 2008 10:35 AM Jeff Hughes [MSFT] wrote:


>>>> Yes, if the quorum files were being backed up at the time, that's very
>>>> possible why you got an error 5. You do not need to backup the quorum and it
>>>> should be excluded from your scheduled backups. There's nothing there you'd
>>>> ever need to recover since all the quorum is used for is maintaining a copy
>>>> of the cluster database and any checkpointed registry keys, and you can
>>>> always recreate those files if needed.
>>>> --
>>>> Jeff Hughes, MCSE
>>>> Support Escalation Engineer
>>>> Microsoft Enterprise Platforms Support (Server Core/Cluster)
>>>>
>>>>
>>>> "Matthias" <> wrote in message
>>>> news:1D75AFAD-DA76-4A13-B925-...



>>>>> On Monday, June 09, 2008 12:51 PM steffen busch wrote:


>>>>> Hello,
>>>>>
>>>>> i got nearly the same messages as descriped above.
>>>>>
>>>>> But my error code is 2
>>>>>
>>>>>
>>>>>
>>>>> Event Type: Error
>>>>>
>>>>> Event Source: ClusSvc
>>>>>
>>>>> Event Category: Database Mgr
>>>>>
>>>>> Event ID: 1000
>>>>>
>>>>> Date: 06.06.2008
>>>>>
>>>>> Time: 14:34:44
>>>>>
>>>>> User: N/A
>>>>>
>>>>> Computer: SVREHDWHCLN1
>>>>>
>>>>> Description:
>>>>>
>>>>> Cluster service suffered an unexpected fatal error at line 2236 of source module d:\nt\base\cluster\service\dm\dmlog.c. The error code was 2.
>>>>>
>>>>>
>>>>>
>>>>> For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
>>>>>
>>>>>
>>>>>
>>>>> then i got several messages:
>>>>>
>>>>>
>>>>>
>>>>> The system failed to flush data to the transaction log. Corruption may occur.
>>>>>
>>>>>
>>>>>
>>>>> After that only this messages appear:
>>>>>
>>>>>
>>>>>
>>>>> Cluster service is requesting a bus reset for device \Device\ClusDisk0.
>>>>>
>>>>>
>>>>>
>>>>> Cluster Service did not start any more:
>>>>>
>>>>>
>>>>>
>>>>> Server specific error code 5086
>>>>>
>>>>>
>>>>>
>>>>> The cluster fails over properly and is running on the other node.
>>>>>
>>>>>
>>>>>
>>>>> But the first node died
>>>>>
>>>>>
>>>>>
>>>>> Any ideas??
>>>>>
>>>>> I do not want to evict the node, or set up the machine new.
>>>>>
>>>>>
>>>>>
>>>>> Config:
>>>>>
>>>>>
>>>>>
>>>>> FSC Blade BX630
>>>>>
>>>>> Win2k3 64 bit
>>>>>
>>>>> Sql 2005 SP2
>>>>>
>>>>>
>>>>>
>>>>> IBM SVC San FC Connected
>>>>>
>>>>>
>>>>>
>>>>> Thanks for your help




 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Loosing Printer configuration options during cluster node switch CTI Clustering 1 08-21-2006 02:18 PM
cluster switch up front Nalaka Clustering 2 11-29-2005 05:29 PM
switch failure - what happens to cluster inteq Clustering 5 04-08-2005 01:34 PM
Using a Switch for the private network in a 4 nodde cluster Humberto Gonzalez Clustering 1 11-21-2004 08:28 AM
lost switch fabric on one node causes total cluster failure Dave's not here Clustering 3 07-26-2004 08:54 PM



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59