Alert history with comments

July 7, 2015, 8:01 am

≫ Next: Reports: Y Axis scale in Custom Charts

≪ Previous: object reference not set to an instance of an object

We need a way to generate an alert history report with the comments attached. Specifically we are looking at the High CPU with Top Processes alert. The notes contain the details on which process where the cause and we need to see if there are similarities across servers and or patterns.

Was trying to use the web based reports to do this but it appears the notes field isn't available there. Not a SQL guy so hoping someone else has done this and can share.

↧

Reports: Y Axis scale in Custom Charts

October 16, 2015, 8:02 am

≫ Next: Volumes not being polled

≪ Previous: Alert history with comments

Is there a way to set the maximum scale value for charts based on a custom query in the Web Based Reports. Specifically to st the max value to 100% for utilization charts rather than have it autoscale?

↧

Volumes not being polled

October 7, 2015, 10:59 am

≫ Next: Misleading Linux File System Available space reported (5% ext buffer ignored in calculation)

≪ Previous: Reports: Y Axis scale in Custom Charts

I am trying to use PowerShell to automate adding nodes. I can't not get the volumes to be monitored. I can add them to the database and they show up under the node's summary page, but the volumes have no data. Searching I did find and readhttps://thwack.solarwinds.com/thread/79088?q=Not%20mo, even though I made suer that VolumeType is set to "Fixed Disk" the VolumeTypeID is still showing up as 0.

Here is the part of the code that gets the drive information from WMI, and then adds them to the database:

$drives = Get-WmiObject -Credential ($creds) -ComputerName $Hostname -Class win32_Volume -Filter DriveType=3

foreach ($drive in $drives){

#Getting information from the driver and using it to build requirments in SolarWinds

$DriveCaption = "$($drive.Caption) Label:$($drive.Label) $([Convert]::ToString($drive.SerialNumber, 16))";

$DriveDescrption = "$($drive.Caption) Label:$($drive.Label) Serial Number $([Convert]::ToString($drive.SerialNumber, 16))";

#Adding everything to a hash

$AddDrive = @{

NodeID=$NodeID; # NodeID on which the interface is working on

VolumeIndex= "4";

VolumeTypeID = "4";

Status=1;

#Type ="Fixed Disk";

VolumeType = "Fixed Disk";

Icon ="FixedDisk.gif";

Caption=$DriveCaption;

VolumeDescription=$DriveDescrption;

FullName = "$hostname-$DriveCaption"

PollInterval=120;

StatCollection=15;

RediscoveryInterval=30;

NextRediscovery=[DateTime]::UtcNow;

NodeID=$NodeID;

VolumeType="Fixed Disk";

VolumeTypeID="4";

Icon="FixedDisk.gif";

VolumeIndex="4";

Caption=$DriveCaption;

VolumeDescription=$DriveDescrption;

#status=1

FullName = "$hostname-$DriveCaption"

PollInterval="120";

StatCollection="15";

RediscoveryInterval="30";

#NextRediscovery=[DateTime]::UtcNow;

}

#Inserting drives to be monitored into the database

$driveURI = New-swisobject $swis -EntityType "Orion.volumes" -properties $AddDrive

Here is the code that is adding the pollers and the pollers that I am adding:

function AddPoller($PollerType){

Write-Verbose "Adding $pollertype to database"

#Getting the first part of the WMI string before the period (.), which Solarwinds needs for it's database.

$NetobjectPrefix = $Pollertype.split("{.}")

#setting params for the database fields

$poller["PollerType"]=$PollerType;

$poller["NetObject"]= $NetObjectprefix[0] +":"+$nodeid;

$poller["NetobjectType"] = $NetobjectPrefix[0];

$poller["NetObjectID"] = $nodeid;

#Adding to the database

$pollerUri = New-SwisObject $swis -EntityType "Orion.Pollers" -Properties $poller

}

Addpoller("I.StatisticesErrors32.SNMP.IfTable")

Addpoller("I.StatisticsTraffic.SNMP.Universal");

Addpoller("I.Status.SNMP.IfTable");

Addpoller("I.Rediscovery.SNMP.IfTable");

Addpoller("N.Status.ICMP.Native");

Addpoller("N.ResponseTime.ICMP.Native");

Addpoller("N.Details.WMI.Vista");

Addpoller("N.CPU.WMI.Windows");

Addpoller("N.Uptime.WMI.XP");

Addpoller("N.CPU.WMI.Windows");

Addpoller("N.Memory.WMI.Windows");

Addpoller("N.AssetInventory.Wmi.Generic");

Addpoller("V.Details.SNMP.Generic");

Addpoller("V.Statistics.SNMP.Generic");

Addpoller("V.Status.SNMP.Generic");

If I add the node via the web portal, everything is monitored. I pulled the pollers from this test. If I added them to the database myself, it doesn't seem to work properly.

↧

Misleading Linux File System Available space reported (5% ext buffer ignored in calculation)

October 5, 2015, 8:17 am

≫ Next: Monitor external website

≪ Previous: Volumes not being polled

How are you Linux gurus handling the disk space reporting discrepancy that can, among other things, hide from Oracle DBAs the fact that their file systems are nearly ready to crash?

Configuration:

SAM 6.2.1
LINUX: RHEL 6.4 6.7
NET-SNMP: 5.5-44 & 5.5-54

Issue: SolarWinds uses user space consumed rather than user+reserve to calculate space remaining

$ df -B1 -P /lv*[69]
Filesystem                            1-blocks         Used   Available Capacity Mounted on
/dev/mapper/vgdata1-lvoraundo      52710469632 49985454080    40660992     100% /lvi3pld06
/dev/mapper/vgbackup1-lvorabackup 369766273024 329342873600 21638115328      94% /lvi3pld09

Math

{1-block} - {Used} != {Available}

52710469632 - 49985454080 = 2725015552 ( != 40660992 )

Difference: 2725015552 - {Available} = 2725015552 - 40660992 = 2684354560 --> 5% of {1-block}

However,

{1-block} - ( {Used} + ( 0.05 * {1-block} ) ) == {Available}

This is due to the EXT file system reporting the reserved buffer space in its' total space, and DF is reporting the Available space as what is truly available to user processes on the system.

SNMP OID reponses:

.1.3.6.1.2.1.25.2.3.1.1.46 = INTEGER: 46
.1.3.6.1.2.1.25.2.3.1.2.46 = OID: .1.3.6.1.2.1.25.2.1.4
.1.3.6.1.2.1.25.2.3.1.3.46 = STRING: /lvi3pld06
.1.3.6.1.2.1.25.2.3.1.4.46 = INTEGER: 4096 Bytes
.1.3.6.1.2.1.25.2.3.1.5.46 = INTEGER: 12868767
.1.3.6.1.2.1.25.2.3.1.6.46 = INTEGER: 12203480
.1.3.6.1.2.1.25.2.3.1.1.49 = INTEGER: 49
.1.3.6.1.2.1.25.2.3.1.2.49 = OID: .1.3.6.1.2.1.25.2.1.4
.1.3.6.1.2.1.25.2.3.1.3.49 = STRING: /lvi3pld09
.1.3.6.1.2.1.25.2.3.1.4.49 = INTEGER: 4096 Bytes
.1.3.6.1.2.1.25.2.3.1.5.49 = INTEGER: 90274969
.1.3.6.1.2.1.25.2.3.1.6.49 = INTEGER: 80405975
.1.3.6.1.2.1.25.4.2.1.1.49 = INTEGER: 49

Math that SW uses: {.1.3.6.1.2.1.25.2.3.1.5.X} - {.1.3.6.1.2.1.25.2.3.1.6.X} = Available

--> this is the Available they then use in their calculations of free space...but as you can see in the numbers, it makes the math fault again, Not representing the buffer space on EXT file systems.

Consequence:

Graphs, alerts, etc., all show 5% more available than there actually is. This requires quite a bit of training for users who get alerts, etc., as a 95% utilization report on a file system that is 1T, thinking they have plenty of space left...which they don't.

How are you Linux gurus handling this through SolarWinds/Orion/SAM?

↧

Monitor external website

October 16, 2015, 5:15 am

≫ Next: How do you troubleshoot complex problems?

≪ Previous: Misleading Linux File System Available space reported (5% ext buffer ignored in calculation)

This seems like it would be such an easy thing to do - but not for me.:-(

So I created a node and called it "Solarwinds" used polling address 74.115.12.20 and set it to External Node: No status.

I then created a SAM Template called "External Website Check) and added to monitors HTTP & Weblink, I then added my "Solarwinds" node to test against.

I get "Unable to connect to the remote server", We have no idea why it doesn't work.

↧

How do you troubleshoot complex problems?

April 21, 2015, 7:57 am

≫ Next: Business hours alert and after hours alert

≪ Previous: Monitor external website

Hi! I'm on the User Experience team at SolarWinds and am trying to learn more abouttroubleshooting processes. If you could answer the following two quick questions in the comments section, I would greatly appreciate it!

1. How often do you encounter complex problems when troubleshooting incidents? A complex problem in this case would be where you have to identify and keep track of multiple data points in order to identify root cause.

Never
Rarely
Sometimes
Often
Always

2. How do youkeep track of the data points/evidence when troubleshooting? Do you use a specific tool, pen and paper, keep it in your head, keep multiple browser tabs open, etc?

Thanks in advance for the help!

↧

Business hours alert and after hours alert

October 16, 2015, 9:01 am

≫ Next: NPM, NCM running on Linux

≪ Previous: How do you troubleshoot complex problems?

I want to split an alert to alert during business hours and the other to alert after hours.

It looks like I can make one alert to specify the business hours and the other I can specify as Daily, all day and add a schedule to disable that alert during business hours.

Before I activate the alerts I thought I'd post to ensure I got the logic right.

↧

NPM, NCM running on Linux

October 16, 2015, 7:24 am

≫ Next: Steps for migrating NPM instance into another existing NPM instance

≪ Previous: Business hours alert and after hours alert

I would like to use Solarwinds at our environment but our servers are all linux, not window.

Is there any way I can install Solarwinds product (NPM, NCM) on linux server?

Thanks

↧

Steps for migrating NPM instance into another existing NPM instance

October 16, 2015, 9:01 am

≫ Next: PIVOT function with SWQL?

≪ Previous: NPM, NCM running on Linux

I was hoping someone may have done this before, and wanted any/all input I could gather before I build a MOP to complete the task.

Currently, company has two instance of NPM:

- 1 for sites in North America.

- 1 for sites outside of North America.

We would like to move the outside of North America sites into the existing North America NPM instance. At that time we will then have a single instance of NPM handling all sites and nodes for the company. Then we will decommission the older server.

NPM on both servers is 11.0.1

NTA is 3.11.0

Thanks in Advance.

↧

PIVOT function with SWQL?

May 12, 2015, 7:38 pm

≫ Next: How can you poll multiple isolated sites that are in same IP space?

≪ Previous: Steps for migrating NPM instance into another existing NPM instance

Can anyone comment on PIVOT functionality with SWQL? Perhaps an example of where it's been used previously?

I have a web resource for the dashboard view using SWQL perfectly, but I would love to see the row data flipped around into column for easier consumption.

I did exactly what I wanted to do on a web report with a custom SQL query --- and of course I can do a custom SQL query report through report writer and display the report on the view. Works great for me as the admin, but there are known issues with custom SQL reporting and account limitations (it gets broken).

The answer seems simple, to write the view Custom Query resource in SWQL with a pivot table, but I can't for the life of me seem to get it functional.

Anyone run into the same issues previously?

- Matt

↧

How can you poll multiple isolated sites that are in same IP space?

October 15, 2015, 3:59 pm

≫ Next: NTA 4.1 install error - Invalid Orion Core Configuration Value. The registry value is not valid.

≪ Previous: PIVOT function with SWQL?

For example . Two remote sites have same IP 10.0.0.0/24 networks. Is NPM capable of separating the two sites even though they are in same IP subnet?

↧

NTA 4.1 install error - Invalid Orion Core Configuration Value. The registry value is not valid.

October 13, 2015, 9:04 am

≫ Next: SDK example for CreateNode needs fixing for NPM 11.5.2

≪ Previous: How can you poll multiple isolated sites that are in same IP space?

Has anyone received this error message when performing a fresh install of NTA 4.1. The flow storage server is configured and the connection tests ok. I am getting this error message on the polling server.

NTA 4.1 install error - "Invalid Orion Core Configuration Value. The registry value at HKLM\SOFTWARE\SolarWinds\Orion\Core\Configuration is not valid".

↧

SDK example for CreateNode needs fixing for NPM 11.5.2

October 16, 2015, 3:17 pm

≫ Next: Hardware Alerting Quandry

≪ Previous: NTA 4.1 install error - Invalid Orion Core Configuration Value. The registry value is not valid.

Adding a Node for Monitoring
To add a new node to Orion and enable monitoring of the node, create a new Orion.Nodes entity and
register it for polling the relevant set of information.
First, review the PowerShell script example for adding a node.
# initialize SWIS connection
$swis = Connect-Swis
# add a node
$newNodeProps = @{
EntityType="Orion.Nodes";
IPAddress="10.0.0.1";
IPAddressGUID="0100000a-0000-0000-0000-000000000000";
Caption="";
DynamicIP=$False;
EngineID=1;
Status=1;
UnManaged=$False;
Allow64BitCounters=$False;
SysObjectID="";
MachineType="";
VendorIcon="";
ObjectSubType="SNMP";
SNMPVersion=2;
Community="public";
}

This is insufficient for NPM 11.5.2 -- it leaves the SNMPv3 credentials as NULL, and the Trap manager will not be able to handle traps from the node.

the error report in the logfile looks like this:

2015-10-13 12:32:00,730 [34] ERROR SolarWinds.Orion.Common.ManagedNodeState - Error occured during the getting node with IP [xxx.xxx.xxx.xxx] from Nodes table!

System.Data.SqlTypes.SqlNullValueException: Data is Null. This method or property cannot be called on Null values.

at System.Data.SqlClient.SqlBuffer.get_String()

at System.Data.SqlClient.SqlDataReader.GetString(Int32 i)

at SolarWinds.Orion.Common.ManagedNodeState.AddNewNode(String ip, Boolean getSnmpCred)

'editing' a node properties and submitting without changing them updates the NULL values in the SNMPv3 credentials.

↧

Hardware Alerting Quandry

October 16, 2015, 3:21 pm

≫ Next: NPM 11.5.2 GroupofGroups limitations breaking Information Service v3

≪ Previous: SDK example for CreateNode needs fixing for NPM 11.5.2

We are processing auto-generated incident tickets for hardware warning and critical conditions for all monitored devices, using a handful of alert rules to try and manage all the variations in behavior from monitored components in servers and network devices.

One difficulty is how to deal with monitored objects that float above and below their monitored thresholds intermittently (like power supply voltage, board battery status and temperature sensors).

We have attempted to compromise with the alert recipients by setting insane levels of suppression in the trigger and reset, but I'd welcome suggestions on a more eloquent solution for hardware alerting.

Trigger Condition:

Reset Condition:

↧

NPM 11.5.2 GroupofGroups limitations breaking Information Service v3

October 12, 2015, 4:12 pm

≫ Next: Is the thwack store officially down?

≪ Previous: Hardware Alerting Quandry

Since upgrading to NPM 11.5.2 last week we've had intermittent, and serious, problems with the information service crashing.

I've uploaded diagnostics, and opened two cases with support (one actually as a system down -- but no call-back, and I've phoned and left messages, or been diverted into a voicemail box. Leapfile is uploading at 1.4Mbps, which is a little frustrating on a 1Gbps network connection.)

2015-10-12 13:07:14,833 [4] ERROR SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.LimitationSnapshotService - (null) System.Data.SqlClient.SqlException (0x80131904): The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()

at System.Data.SqlClient.SqlDataReader.get_MetaData()

at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)

at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)

at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)

at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)

at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)

at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)

at SolarWinds.InformationService.DataProviders.SqlQueryRelation.<GetEnumerator>d__0.MoveNext()

at SolarWinds.Data.Query.PhysicalQueryPlan.ProjectOp.<GetEnumeratorInternal>d__0.MoveNext()

at SolarWinds.Data.Query.PhysicalQueryPlan.PhysicalQueryPlan.<GetEnumerator>d__0.MoveNext()

at SolarWinds.InformationService.Core.InternalQueryResultReader.MoveNext()

at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerMemberFetcher.GetMembers(ICollection`1 definitions, IDictionary`2 members, Int32 containerId, Boolean clearDefinitonsAndRetryOnError)

at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerMemberFetcher.GetMembers(IEnumerable`1 containerIds)

at SolarWinds.Data.Providers.Orion.Containers.DataProvider.MemberCache.GetMembers(IEnumerable`1 containerIds, ContainerCacheContext context)

at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerTable.CreateScanTooples(ICollection`1 containers, IToople tuple, String[] columnProperties, ContainerCacheContext context)

at SolarWinds.Data.Providers.Orion.Containers.DataProvider.ContainerTable.CreateFilteredRelation(IToopleFactory toopleFactory, IStorageEntityReference entityRef, IEnumerable`1 properties, IExpression filter)

at SolarWinds.Data.Providers.Orion.OrionDataProvider.CreateFilteredRelation(IStorageEntityReference entityRef, IEnumerable`1 properties, IExpression filter)

at SolarWinds.Data.Query.PhysicalQueryPlan.FilteredProviderScanOp.CreateRelation(IDataProvider provider, ICollection`1 properties)

at SolarWinds.Data.Query.PhysicalQueryPlan.ProviderScanOp.<GetEnumeratorInternal>d__0.MoveNext()

at SolarWinds.Data.Query.PhysicalQueryPlan.ProjectOp.<GetEnumeratorInternal>d__0.MoveNext()

at SolarWinds.Data.Query.PhysicalQueryPlan.PhysicalQueryPlan.<GetEnumerator>d__0.MoveNext()

at SolarWinds.InformationService.Core.InternalQueryResultReader.MoveNext()

at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()

at SolarWinds.Data.Providers.Orion.Common.EnumerableExtensions.MemoizedEnumerable`1.<GetEnumerator>d__2f.MoveNext()

at SolarWinds.Data.Providers.Orion.Common.EnumerableExtensions.Enumerated[T](IEnumerable`1 enumerable)

at SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.DAL.LimitationSnapshotDAL.GetAllLimitationEntities(LimitationInfo limitation, String entityType)

at SolarWinds.Data.Providers.Orion.Containers.LimitationSnapshotService.LimitationSnapshotService.GetLimitationItems(IDictionary`2 result, LimitationInfo limitation, String entityType)

ClientConnectionId:0e5cc627-55c6-4e94-925c-ae2a96c4e8da

Error Number:8623,State:1,Class:16 executed with:

limitationId:13, entityType:Orion.Groups

As I was writing this it looks like there is process generating snapshots of limitations? (is this new in 11.5.2?)

This error corrupts SWIS (I think it loses the database connection handle, and subsequent queries on the handle fail, which leads to other oddities); eventually SWISv3 crashes and restarts ... This has been going on all day...

It looks to me like the 'group of groups' type of limitation is breaking SWIS when that periodic recalc is performed :-(

Suggestions?

↧

Is the thwack store officially down?

October 16, 2015, 10:55 am

≫ Next: IVIM polling through vCenter for remote ESX hosts

≪ Previous: NPM 11.5.2 GroupofGroups limitations breaking Information Service v3

Wanted to check out some additional swag purchases, but I'm getting a Jive error.

Like this:

Oops, the page can't be found

Sorry, the page you requested can't be found. You can go back and try again, or start again at home.

↧

IVIM polling through vCenter for remote ESX hosts

October 16, 2015, 3:08 pm

≫ Next: Need t-shirt slogans & ideas for Network Admins!

≪ Previous: Is the thwack store officially down?

We're revising our current polling method for monitoring ESX hosts from direct polling (Poll ESX directly using SNMP and SSH) to polling through vCenter.

Many of our ESX hosts reside on the WAN (remote offices) or in data centers or branch offices other than where their managing vCenter server resides.

For instance; We have an ESX host that lives in Dublin Ireland and is currently managed by a SolarWinds instance there, but it's vCenter server lives in Lewisville, TX and is monitored by a SolarWinds instance in Lewisville.

Will I have to move the monitoring of the ESX box into Lewisville or (worse yet) move the monitoring for the vCenter server into the Dublin monitored inventory?

As we start scaling up the "poll-through-vCenter" method to hundreds of ESX servers, I'm concerned that we'll over-utilize our Lewisville monitoring system by moving all the non-Lewisville ESX into its database and polling engines.

Managing remote ESX with vCenter is fairly common in large environments, but how can IVIM scale up to deal with this when one has a large number of remote ESX that are presently in other Orion instances than their vCenter server?

↧

Need t-shirt slogans & ideas for Network Admins!

October 13, 2015, 2:08 pm

≫ Next: Is there a way to use a logarithmic function to transform a results of UnDP?

≪ Previous: IVIM polling through vCenter for remote ESX hosts

I've been digging through some of the geeky slogans that everyone's submitted over the years (which never get old BTW)... Let's take things up a notch and discuss potential t-shirt ideas! Submit yours in the comments below.

↧

Is there a way to use a logarithmic function to transform a results of UnDP?

October 13, 2015, 7:01 am

≫ Next: ignore administratively shutdown interfaces in discover?

≪ Previous: Need t-shirt slogans & ideas for Network Admins!

Hi all - I have some devices where I am trying to monitor the optical light levels of the optics. The UnDP returns results that are expressed in mW and I would love to display them as dBm. I don't see a function for that in the transform results tool so I am trying:

formula = log({custom_poller_rxPower}/10000)*10 takes a returned value of 6285 and transforms to 6.285, acting like the log function isn't working. I have validated the formula using Excel and it appears to work fine: LOG(6285/10000)*10 = -2.02.

Is this just a Solarwinds limitation?

Thanks,

Scot

↧

ignore administratively shutdown interfaces in discover?

October 16, 2015, 4:10 pm

≫ Next: Reporting and scheduling issue.

≪ Previous: Is there a way to use a logarithmic function to transform a results of UnDP?

Am I just missing something? When I run a discovery and gives me the interfaces to select, I want to pick up anything that's not Admin Down, but

when I select "opearationally down", de-select "Administratively Shutdown", and hit "reselect interfaces" it still selects the "Administratively shutdown" interfaces. Bug? PEBKAC?

↧