SQL Server BDC Hints and Tips: Contained AGs, Ports, containedag_msdb, logs, …

SQL Server BDC Hints and Tips: Contained AGs, Ports, containedag_msdb,​​ logs,​​ 

 

 

This blog​​ post​​ focuses on connecting to the SQL Server BDC,​​ some​​ helpful​​ log files​​ and utility outputs.​​ ​​ Understanding​​ a few basics about a​​ SQL Server​​ BDC and the Contained Availability Group (containedag) makes​​ managing and troubleshooting​​ easier.

https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-high-availability?view=sql-server-ver15​​ 

Cluster IP Address and Port

To connect to a SQL Server BDC use the exposed IP address (FQDN for an AD enabled cluster) and the target’s, listening port.  ​​​​ To locate the IP​​ address​​ and port,​​ use​​ one of​​ the following commands:

kubectl cluster-info

Kubernetes master is running at https://10.193.5.209:6443

KubeDNS is running at https://10.193.5.209:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

azdata bcd endpoint list

{

 ​​ ​​ ​​​​ "description": "SQL Server Master Instance Front-End",

 ​​ ​​ ​​​​ "endpoint": "mssql.aris.local,31433",

 ​​ ​​ ​​​​ "name": "sql-server-master",

 ​​ ​​ ​​​​ "protocol": "tds"

 ​​​​ },

 ​​​​ {

 ​​ ​​ ​​​​ "description": "SQL Server Master Readable Secondary Replicas",

 ​​ ​​ ​​​​ "endpoint": "mssql-secondary.aris.local,31436",

 ​​ ​​ ​​​​ "name": "sql-server-master-readonly",

 ​​ ​​ ​​​​ "protocol": "tds"

 ​​​​ }

kubectl get svc -n mssql-cluster

  • master-svc-external ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​​​  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​​​ NodePort  ​​ ​​​​ 10.99.130.239  ​​ ​​​​ <none>  ​​ ​​ ​​ ​​ ​​ ​​​​ 1433:31433/TCP ​​ 

  • master-secondary-svc-external ​​ ​​​​ NodePort  ​​ ​​​​ 10.100.49.221  ​​ ​​​​ <none>  ​​ ​​ ​​ ​​ ​​ ​​​​ 1433:31436/TCP​​ 

The​​ connection​​ port​​ value​​ depends on​​ the​​ cluster’s​​ High Availability​​ enablement​​ and the desired target​​ of​​ primary or​​ read-only​​ secondary.

Port

Description

31433

When connecting to​​ a​​ SQL Server the default port is​​ commonly​​ 1433. ​​ On a SQL Server BDC 1433 is exposed​​ externally​​ using port 31433​​ (default.)

Example: sqlcmd -UMyUser -PMyPassword -SclusterIP,31433

Non-HA​​ – Connects to the master SQL Server instance.

HA​​ Enabled​​ – Connects to the primary (listener for read and write operations.)

31436

For an HA enabled BDC port 31436 connects to the​​ read-only, high availability replicas.

1433

Locally available from /var/opt/mssql-tools/bin/sqlcmd on the pod

Example: kubectl exec -it -n clusterNamespace master-0 /bin/bash

1533

Locally available from /var/opt/mssql-tools/bin/sqlcmd on​​ a high availability enabled​​ cluster. ​​ 

 

Note:​​ The 1533 port connects​​ to​​ the​​ instance level​​ instead of the high availability listener.

 

Hint:​​ You may want expose 1533 externally so you can connect with SSMS or Azure Data Studio to execute instance specific queries. ​​ Here is an example temporarily exposing 1533 from the three master nodes.

kubectl expose pod master-0 -n mssql-cluster --port=1533 ​​ --type=NodePort --name=sql1533a

kubectl expose pod master-1 -n mssql-cluster --port=1533 ​​ --type=NodePort --name=sql1533b

kubectl expose pod master-2 -n mssql-cluster --port=1533 ​​ --type=NodePort --name=sql1533c

kubectl​​ get svc -n mssql-cluster

 

sql1533a  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​​​ NodePort  ​​ ​​​​ 10.111.117.80  ​​ ​​​​ <none>  ​​ ​​ ​​ ​​ ​​ ​​​​ 1533:8115/TCP

sql1533b  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​​​ NodePort  ​​ ​​​​ 10.111.105.5  ​​ ​​ ​​​​ <none>  ​​ ​​ ​​ ​​ ​​ ​​​​ 1533:31402/TCP

sql1533c  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​​​ NodePort  ​​ ​​​​ 10.101.183.78  ​​ ​​​​ <none>  ​​ ​​ ​​ ​​ ​​ ​​​​ 1533:30367/TCP ​​ 


Contained AG (containedag_master, containedag_msdb)

When a SQL Server BDC is enabled for high availability the contained availability group (containedag) is created and maintained. ​​ A highly available SQL Server BDC​​ installs​​ master-0,​​ master-1​​ and​​ master-2​​ replicas and​​ joins​​ them to the containedag. ​​ The containedag includes the databases containedag_master and containedag_msdb​​ (which are commonly hidden from external view.)

When connected​​ to​​ the listener​​ port,​​ database create and drop commands are upgraded to automatically join/remove​​ the database​​ from the containedag and replicate​​ the state​​ to secondaries.  ​​​​ 

Try this:​​ Connect to the listener​​ port​​ and issue a create database. ​​ Establish a second connection​​ to the read-only port and query the database on a secondary replica. ​​​​ The database exists on all three replicas, automatically​​ added to the containedag​​ availability group.

When​​ connected​​ to the SQL Server BDC​​ over the​​ listener port (default=31433),​​ references to master and msdb are automatically​​ redirected​​ to the containedag_master and containedag_msdb.  ​​​​ For example, if you connect to the listener and create a SQLAgent, TSQL job the job definition and history​​ are​​ physically​​ stored​​ in the containedag_msdb​​ and replicated to secondary replicas.​​ In fact, I wrote a test​​ creating and​​ executing​​ a job on the primary​​ then​​ connecting​​ to the secondary (port=31436)​​ and​​ waiting​​ for the job outcome to arrive in the job history table. ​​ From the​​ user’s​​ perspective it appears as if you are using msdb​​ as you normally would for SQLAgent activities.  ​​​​ 

Try this:​​ Create a job using​​ the​​ listener​​ port​​ and then​​ issue​​ the following​​ query:​​ select * from msdb..sysjobs

  • 31433 – Listener connection: ​​ You will see the new job (you are connected to the listener and msdb references are redirected to containedag_msdb)

  • 1533 – Instance level connection: You will NOT see the job (you are connected to the physical instance and using the physical database msdb)

  • 1533 on a Secondary​​ -​​ Instance level connection: You will NOT see the new job. ​​ Issue a​​ use containedag_msdb​​ command​​ and run the query and you will see the new job has been replicated to the secondary replica​​ in containedag_msdb.

  • 31436 – Read only connection: You will see the new job​​ (you are connected to the​​ read only​​ listener and msdb references are redirected to containedag_msdb)

Note:​​ The target of the connection determines if the job is replicated or the job remains instance specific.

Execute select​​ db_name(database_id) from sys.dm_hadr_database_replica_states

  • Listener connection: You see msdb because of the automatic redirection​​ taking place

  • Instance level connection: You see containedag_msdb because you are looking at the physical data

You are​​ querying​​ at the same database.​​ ​​ The logical referencing of the contained availability group listener redirection makes​​ containedag_msdb appear as msdb.

Who is Primary and Secondary​​ Replicas?

Using the high​​ availability​​ DMVs​​ (sys.availability_groups, sys.dm_hadr_database_replica_states, …) you can query to see which master-* is currently acting as the primary replica.  ​​​​ In fact, login to the listener port and​​ select @@SERVERNAME​​ to quickly determine the​​ server​​ name of the primary replica.

You can also query Kubernetes​​ to​​ view​​ the label assignments for the master-* pods.  ​​​​ When a SQL Server High Availability failover takes place,​​ the labels are updated to indicate the role of primary or secondary.

Example:​​ kubectl edit pod master-0 -n <<clusterNamespace> 

 ​​​​ labels:

 ​​ ​​ ​​​​ MSSQL_CLUSTER: mssql-cluster

 ​​ ​​ ​​​​ app: master

 ​​ ​​ ​​​​ controller-revision-hash: master-79fc98b57

 ​​ ​​ ​​​​ mssql.microsoft.com/sql-instance: master

 ​​ ​​ ​​​​ plane: data

 ​​ ​​ ​​​​ role: master-pool

 ​​ ​​ ​​​​ role.ag.mssql.microsoft.com/master-containedag: secondary

 ​​ ​​ ​​​​ statefulset.kubernetes.io/pod-name: master-0

 ​​ ​​ ​​​​ type: sqlservr

 

Port 31433 Connection Fails on High Availability SQL Server BDC

A connection to port 31433, on a highly available SQL Server BDC, may fail if the availability group is in an unhealthy state.  ​​​​ When a master-* pod is elected as the primary replica it transitions the databases to primary state and listens on port 31433. ​​ If the containedag becomes unhealthy the listener won’t be established and connection attempts fail. ​​ Troubleshoot an unhealthy containedag as you would troubleshoot any SQL Server Availability Group to restore functionality.​​ 

 

The SQL Server DMVs and errorlog are great places to start when troubleshooting an unhealthy availability group.

 

Controller.log

The controller log is a great source of information to follow the availability group and database​​ state​​ transitions. ​​ You can use the following commands to access the controller.log file.

 

kubectl exec -it -n clusterNamespace control-xxxxx /bin/bash

cd /var/log/controller/##/##/##/controller.log

 

azdata bdc debug copy-logs --namespace​​ clusterNamespace​​ --target-folder​​ c:\temp​​ 

 

Search for the​​ log​​ entries containing the keyword “HadrRoleManagerStateMachine” to find entries like​​ the following:

 

2020-01-30 18:04:35.1118 | INFO | [HadrRoleManagerStateMachine: containedag] Transitioning from 'Healthy' to 'StopLeaseRenewal' state.​​ 

2020-01-30 18:04:35.4225 | INFO | [HadrRoleManagerStateMachine: containedag] Transitioning from 'StopLeaseRenewal' to 'Offline' state.​​ 

2020-01-30 18:04:35.4988 | INFO | [HadrRoleManagerStateMachine: containedag] Transitioning from 'Offline' to 'Resolving' state.​​ 

2020-01-30 18:04:35.5689 | INFO | [HadrRoleManagerStateMachine: containedag] Transitioning from 'Resolving' to 'NoPrimary' state.​​ 

2020-01-30 18:04:38.0689 | INFO | [HadrRoleManagerStateMachine: containedag] Transitioning from 'NoPrimary' to 'Resolving' state.​​ 

2020-01-30 18:05:13.1124 | INFO | [HadrRoleManagerStateMachine: containedag] Transitioning from 'Resolving' to 'OnlinePending' state.​​ 

2020-01-30 18:05:13.1452 | INFO | HadrRoleManagerStateMachine:master:containedag:: OnlinePendingTransition:​​ CurPrimary(master-2) ​​​​ 

 

Other information​​ such as a database drop requests​​ are contained in the controller.log as well.

 

2020-01-30 18:05:34.4335 | INFO | DropManagedDatabase request received. StatefulSet: master, Database: StressTREBORClone​​ 

 

mssql-ha-supervisor

Each mssql-ha-supervisor container helps​​ to manage high availability​​ and the associated logs contain additional information.

 

kubectl describe pod master-0 -n​​ clusterNamespace

 

Containers:

mssql-server:

mssql-ha-supervisor:

 

Sample entries from the mssql-ha-supervisor stdout/stderr log files as copied using​​ the​​ azdata copy-logs​​ command.

 

 

[containedag-agent] 2020/02/01 16:37:24​​ lease expired while database disconnected

[supervisor] 2020/02/01 16:37:25 Database operation state for database GiantDB, requestId bea5a9c7-7e2b-43e7-9f3b-98c4dc342ec8

[supervisor] 2020/02/01 16:37:25 Database operation state for database StressTREBOR_830179E1_9451_4E1A_9FB2_75548674A41E, requestId 7b4281f7-820a-4cde-ad2d-33a8a136de0b

[containedag-agent] 2020/02/01 16:37:27 Sent unhealthy heartbeat

[configuration-agent] ERROR: 2020/02/01 16:37:40 Unable to query AG configurations from SQL Server: read tcp 100.64.2.185:57820->100.64.2.185:1533: i/o timeout

 

operator-#### pod

The operator pod​​ listens to and​​ reacts to​​ Kubernetes events. ​​ The operator log is another location for tracking what transpired on a SQL Server BDC.

 

2020/02/08 15:32:00 Got pod event for master-1

2020/02/08 15:32:00 Got pod event for master-2

2020/02/08 15:32:00 Got pod event for master-0

 

Use the SQL Server errorlog, controller, mssql-ha-supervisor, operator and DMV information to view the state of your SQL Server BDC.

 

Mange with notebook:​​ https://docs.microsoft.com/en-us/sql/big-data-cluster/manage-notebooks?view=sql-server-ver15​​ 

Stuart is a colleague of mine and is the ultimate champion for SQL Server BDC notebooks. ​​ I learn more about Notebooks everyday as I use them to help monitor and troubleshoot​​ the 12,​​ SQL Server BDCs​​ in the​​ longhaul​​ lab. ​​ I enjoy the notebooks because they automate​​ many of the​​ data collection​​ and management activities while​​ saving​​ the​​ output. ​​ I also​​ get to​​ view​​ the commands used so I learn how to​​ troubleshoot and​​ manage aspects of the SQL Server BDC I may not be familiar with.

 

I am currently working to convert the SQL Server BDC longhaul test runs to rely solely on notebook collection. ​​ This means our development and support staffs​​ use the same notebooks to troubleshoot a SQL Server BDC we provide to you.

 

 

Manage with controller dashboard :​​ https://docs.microsoft.com/en-us/sql/big-data-cluster/manage-with-controller-dashboard?view=sql-server-ver15​​ 

Another helpful facility is the controller dashboard. ​​ The controller dashboard provides you reports, links​​ and drill-ins​​ for your​​ SQL Server BDC.

​​ 

 

  • Bob Dorr