Return to Blogs


Don’t skimp on your DB

“Premature optimization is the root of all evil” – Donald Knuth

NoSQL

NoSQL databases have been trending for a while being hailed as a cure-all solution for scaling on the web and real time information. So people tend to go with the flow and without actually asking questions decide on one of the many NoSQL technologies without even thinking about it(Most likely MongoDB). And they even do so while forgetting one of the most fundamental steps of application creation, Schema definition. Talking about schema definition while NoSQL certainly brings a lot to the table like:

  • Easy Scalability.

  • Easy Replication.

  • No need for a specific schema (Great for applications with dynamic fields).

  • Less maintenance.

  • Lightning fast!? (you could easily mess up some configuration).

They bring their own set of problems, like for example:

  • They allow to defer schema definition (will make a mess later on)

  • No Support for ACID Transaction (maybe not a problem for a chat or a social network, but how about a bank or any other business process?).

  • Limited querying capabilities.

  • No maturity around the ecosystem.

  • Schema must be based around what you will query not what you want to save.

But before choosing to go NoSQL remember you are exchanging features, maturity and flexibility(Querying) for speed and a different type of flexibility(Schema creation). And if you need any proof of this just check out most of the COOL,NEW, and EXCLUSIVE features your favorite NoSQL database of choice has has or will implement, have been there for a long time in this old and dusty databases.

OK, I get it NoSQL is not the true panacea for data storage, but I still need to go fast

So you decided that you could live without being cutting edge but still are in dire need of the speed that NoSQL provides. There are several ways to achive this:

Reduce the number of query calls you make

Try to access the database as little as possible, each query is a round trip that may add to your times. Changing the following query:

Select q.inner_text,q.question_order FROM question q WHERE q.random_field = ?

That is repeated a hundred times (1 ms per call times 100 calls?), can easily be changed to something like

Select q.inner_text,q.question_order FROM question q WHERE q.random_field IN (?)

Other than this you can use JOINS and SUBQUERYING to do the job; My personal recomendation is to try to keep the number of querys to less than 20 pero process,extra points if you can make it in les than 5 calls, and god shall hail you if you can make it in ONE.

Ok so reducing the number of querys was not good enough

Now you are trying to bash me saying all that work and my app is nowhere neer fast or good enough, calm your horses. With maturity comes features, and with features come options.

Add indexes

If you know your basics about database optimization you know about indexes. And if you don’t you are in luck.

You can think of the indexes the same way you think about your books indexes of apendixes, they are a quick reference to find a given “word” (column value) in your “book”(table), and the best part is that they are incredibly easy to create.

CREATE INDEX index_name ON table_name (column_name)
-- If this was slow it should fly now
SELECT * FROM table_name WHERE column_name = ?;
-- for character columns indexes only work for finding things starting with
SELECT * FROM table_name WHERE column_name LIKE ? + '%';

NOTE:This is the most basic index you can make, and depending on your use case there may be a specific kind of index that would suit you better.

I added indices but it is still too slow

Now we are talking about types of workloads that may be better suited for a NoSQL solution but hold your horses. If you have reached this point that means either you have something terribly wrong with your schema,your queries, your database is really busy(you need to think about replication), or you have finally exeeded the workload for a relational database and you got a real reason to change(you can always just replicate for your life, just look at facebook).

First lets check your queries. All database providers are bundled with a feature called query explain, witch will rougly tell you how the database plans to execute your query. Depending on your database the way of displaying this will change, this specifics are beyond the scope of this article. Just know that if your database is being slown down by a small amount of specific queries, most likely you can change the query for some extra juice.

Then you may be in the need to denormalize some tables. lets day for example, you got the tables

  • cities

  • states

say you always join them, well you may merge this tables to a table called states_cities and just use that one. This is fine and all for catalogs with 1:M or 1:1 relationships, but if you end up with the need to merge actual data tables, just be confortable with repeated data.

BONUS: if your database suports it and you are willing to, check out materialized tables and table partitioning (Note that this add some administration complexity to the system so thread with care).

BONUS 2: if you have some stale never used data you may try delete it(bad idea but if you roll with that), or move it to another table(see table partitioning)

BONUS 3: Here is a great reference for increasing your SQL database performance for your specific flavor.

Conclusion

While this article mainly favors SQL databses for their flexibility(great for reporting and data mining) giving you tips for speedups in case of need, if you don’t have the time and/or are willing to make the trade off of using NoSQL by all means go ahead.

DISCLAIMER: Digital on Us did not author or own any of the images in this text.

 Return to Blogs