SQL Server on IoT Edge and Developer Machines – Smaller Footprint

SQL Server will ship Azure SQL Database Edge:
https://azure.microsoft.com/en-us/services/sql-database-edge

With the announcement I can tell you more about one of the things we have been working on; SQL Server running on IoT Edge and Developer machines in under 500MB of memory. The effort goes beyond IoT Edge devices and extends to the common developer experience.

The effort focuses attention on memory usage and disk space requirements of SQL Server. As you can imagine, with a code base as large and vast as SQL Server, there is always room for improvement.

The first tenant: Provide minimal footprint with the broad set of SQL Server features.
The second tenant: Try to make the adjustment beneficial to any SQL Server tenant.
The third tenant: Reducing memory does not mean reducing the feature set.

Some features are still being worked on for IoT Edge but our goal is to provide the same SQL Server feature set. For example, Azure SQL Database Edge can already use TSQL Predict Machine Learning (ML) and Column Storage.

SQL 2017

The reduction effort started when we shipped SQL Server 2017 (https://bobsql.com/sql-server-2017-improved-resource-usage-on-smaller-machines/.) Reducing overhead from the CLR Garbage collector and the SQL Server worker threads allowed SQL Server to run comfortably within 2GB of memory.

Engineering Rigor

There are a bunch of folks with compiler, linker, SQL Server code, testing and other disciplines who provided input.

As we ranked the ideas a pattern emerged, termed ‘Engineering Rigor.‘ For example, going as far back as SQL Server 7.0 lots of work was done to partition by Numa Node, CPU or other meaningful schemes. Partitioning allows SQL Server to scale to the largest systems with record-setting performance. However, partitioning can require more memory.

Let’s use the Sub/Super Latch as an example. A latch is a read/write locking mechanism used by SQL Server. Specifically, each data page stored in Buffer Pool has an associated latch used to protect access to the data on the page. The Buffer Pool has knowledge of how long it should take to acquire the latch (lock it) and if the request is read or write. When a specific page encounters heavy read requests, and the acquire is taking longer than the expected time, a promotion to a super-latch can occur. In short, the single latch is partitioned by CPU and the original latch becomes a proxy. A read request now acquires the latch associated with the worker threads’ CPU partition. Readers only need to acquire their local, CPU partition latch but writers must acquire on all partitions. This means the readers use memory local to their CPU, reducing memory invalidations improving performance.

Assume a latch requires 32 bytes of memory. When promoted to a super-latch on a 64 CPU system the memory requirement becomes 32 + (32 *64) = 2080 bytes.

Magic 8

Testing shows that locking mechanisms, such as spin locks, start to become saturated when shared across 8 to 12 CPUs and where the physics become a direct contributor to partitioned design. However, on smaller systems and VMs the partitioning may not be required to maintain performance. Reference Automatic Soft NUMA.

Areas

When you look closely at ring buffers, latches, memory partitioning, lock manager, and other partitioned components there are savings to be achieved. Here are some of the savings that reduce the overhead of SQL Server.

Area

Description

Adaptation To Smaller Environments

Hash Table Sizes

Large hash tables have wasted space for buckets that are empty on smaller installations.

The size the hash tables can be reduced or made dynamic without impacting performance or functionality.

CLR GC Model

There are two garbage collection models for CLR, server and client. Server mode uses a heap per CPU where-as the client mode uses a single heap.

SQL Server enables client mode garbage collection.

Ring Buffer Sizes

The ring buffers can hold 1000s of records and be created per CPU to optimize performance.

Reduce the number of records per ring buffer to ¼ the size and when possible remove per CPU creations.

Networking Pump Threads

SQLPAL uses a dynamic set of networking, I/O pump threads to handle network requests.

Disable dynamic creation of network, I/O pump threads and cap to a single background thread while allowing all threads to cooperate in network I/O pump activities.

Async Pump Threads

SQLPAL uses a dynamic set of async, I/O pump threads to handle non-network I/O requests.

Disable dynamic creation of async, I/O pump threads and cap to a single background thread while allowing all threads to cooperate in async I/O pump activities.

Super Latching

SQL Server allows promotion of various latches to super latches.

Disable super latching capability.

Memory Object Partitioning

SQL Server allows the internal memory objects to be dynamically partitioned by NUMA node or CPU.

Disable memory object partitioning capabilities.

Ghost Record Cleanup

Ghost record cleanup can occur inline with the query execution or be queued to a background task for processing.

Remove the background task and aggressively reclaim database space during query execution.

Lock Manager

The lock manager has partitions, lock block cache and other structures.

Reduce the number of partitions and size of the cache.

Parallel Queries

SQL Server can elect to use a parallel query to process the request.

Disable parallel activities.

Multiple Log Writers

SQL Server can start multiple log writers on hidden schedulers to process transaction log requests.

Remove the use of hidden schedulers, by log write workers, and reduce the number of log writers.

Minimal Startup (-f)

SQL Server can be started using the -f parameter to allow minimal startup.

Apply minimal startup activities and allow multiple connections.

IO Request Caches

SQLPAL may cache I/O request structures with each thread.

Use per CPU instead of per thread partitioning.

Demand Paging

Demand paging is the ability for the operating system to copy a disk page into physical memory only if an attempt is made to access that page. Demand paging is commonly used for binary images (DLLs, EXEs) and saves committed memory for the process.

Demand paging is useful in reducing the true memory footprint because demand paging only uses RAM when the page is accessed. For example, if you are running SQL Server but not using column store the code associated with column store does not need to be loaded (paged) into memory.

To accommodate demand paging the DLLs and EXEs should be mapped into memory (MapViewOfFile, mmap) allowing the operating system to do the magic. To accommodate mapping the file layout should be aligned with memory pages (usually 4KB) boundaries.

SQL Server 2019 builds DLLs and EXEs with 4KB file alignment to enable better demand paging. Using dumpbin /headers sqllang.dll to extract the header information you can see the change.

SQL Server 2017

SQL Server 2019

OPTIONAL HEADER VALUES

20B magic # (PE32+)

12.10 linker version

1A6BE00 size of code

D08C00 size of initialized data

0 size of uninitialized data

30F5B0 entry point (000000010070F5B0) DllEntryPoint

1000 base of code

100400000 image base (0000000100400000 to 0000000102B78FFF)

1000 section alignment

200 file alignment

OPTIONAL HEADER VALUES

20B magic # (PE32+)

12.10 linker version

1A6BE00 size of code

D08C00 size of initialized data

0 size of uninitialized data

30F5B0 entry point (000000010070F5B0) DllEntryPoint

1000 base of code

100400000 image base (0000000100400000 to 0000000102B78FFF)

1000 section alignment

1000 file alignment

COMDAT Folding

COMDAT folding can also be called COMDAT elimination. Blocks of code that are identical are consolidated into a single copy, reducing the image size. As the compiler and linker construct image, identical blocks are identified and eliminated from the final image. C++ template usage is an example of where COMDAT folding can shine.

SQL Server 2017 uses function address comparisons which the enablement of COMDAT would break. For SQL Server 2019 executables COMDAT is enabled and I made the ~1500 uses of function pointer comparison COMDAT compliant. Enabling COMDAT for the SQL Server images allows size reductions. For example, the sizes of sqllang and sqlmin are each reduced by ~6MB, saving disk and memory space.

Recap

I have highlighted just a few of the ways we are shrinking the footprint of SQL Server and still maintaining SQL Server’s rich feature set. There are many more items on our list that we will continue working on while upholding the tenants.

Bob Dorr – Principal Software Engineer SQL Server

Linux