ScyllaDB is an Apache Cassandra compatible database system, developed with performance in mind (and therefore written in CPP). In this posts-series, I’ll discuss some of the important things you should know when designing a data model for Scylla based applications (most of these things applies for Cassandra as well). In this post, I’ll start by discussing what is a primary key in Scylla, and how to choose one.

Image for post
Image for post
Image by Scylla Inc.

Choosing Primary key, Partitioning and Clustering key

A table in Scylla contains multiple partitions. When defining a table in Scylla, you choose which columns will be part of your primary key. The primary key uniquely identifies a row. …

Developers and DBA’s are often required to design a solution to solve the following problem: there is an application that needs to receive a feed of new/updated data from SQL Server. There can be many scenarios where this requirement rises: ETL process that needs to receive recently-changed data, an application that waits for new information to begin some workflow, etc.

Image for post
Image for post
Common ETL pipeline

When SQL Server is the “source of truth”, many times the immediate solution people come up with is to find a way to “distribute” the new data from MSSQL to the other application that needs it. …

Using ORM for database related work is a common practice in applications developed in the last decade. There are many popular ORM’s. One of the most common ORM’s in the .NET world is Entity Framework.
In this post, I’ll go over some common mistakes when working with Entity Framework with Microsoft SQL Server (although some of the examples are relevant for other ORM’s and DB’s).
All of the examples in this article are based on the Stack Overflow database published by Brent Ozar that I slightly modified.

Image for post
Image for post
The database schema all of the following code snippets are based on (StackOverflow).

#1: The N+1 Problem

How many SQL queries ran as a result of this code? If you said just one, you were wrong. Yes, indeed we can get all this information using a single SELECT statement, but we have many more statements running here: The first one, running as a result of the foreach loop. We want to iterate over all the posts with Title that’s not null, so we query the DB to get those posts. Then, for each post, we show it’s the title (which is the information we already have) and the author’s name. But, we didn’t query for the author’s name at our first query. In that case, Entity Framework uses one of the great features that make it so comfortable for the software developer — lazy loading. Lazy loading means that when we need the information about the post’s author, Entity Framework will generate another query to get this data. And in this code, it’ll do it for every post, and every query will get the author details for just that single post in the current iteration block. Meaning, we’ll have n executions of the additional query for the post’s author (n is the number of posts returned by the first query). …

Shahar Gvirtz

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store