Published on 2013-10-25 by John Collins. Please follow me on Twitter for more:
I am currently evaluating Redis1 for a number of projects, that broadly fit within the following use cases:
Before I can progress with addressing those, I began by first looking at Redis from an architectural and operational perspective, to ensure that it would meet our reliability and security requirements.
Before I begin, let me firstly state that I am new to Redis and relatively new to NoSQL in general, so take any statements I make after this point with a pinch of salt: I am not a Redis expert. I am collected notes on my first encounters with Redis and sharing them here as I learn more about the system.
All of the consumers of my APIs that will wrap Redis will be using PHP, so everything will be evaluated within that context. The projects will handle high traffic and require high reliability.
In the past, I would have tackled my four use cases using various other systems:
The appeal of a system like Redis is that it can handle all four, to varying degrees of success. That is appealing to the system administrator in me, because it means you can focus you efforts on gaining expert knowledge in one service rather than three. It also means you can leverage the same system architecture to handle multiple workloads.
The inverse of this is that Redis becomes a major single point of failure, making reliability especially important.
After doing some research into deployment architectures for Redis, it appears that it only supports master-slave replication, with a single master handling writes and X number of read-only slaves. Replication happens over a TCP/IP port, while password authentication can be enabled between the nodes to increase security. There is a clustering solution in development, but it is not production ready2.
There are a number of issues with the current Redis master-slave architecture:
"There are several problems that surface when a slave attempts to synchronize off a master that manages a large dataset (about 25GB or more). Firstly, the master server may require a lot of RAM, even up to 3X the size of the database, during the snapshotting. Although also true for small databases, this requirement becomes harder to fulfill the bigger the database grows. Secondly, the bigger the dataset is, the longer and harder it is to fork another process for snapshotting purposes, which directly affects the main server process. This phenomenon is called "latency due to fork" and is explained here and at redis.io."
And from the same article:
"Remember that after all the forking is done, the file needs to be copied from the master by the slave. Regrettably, this is carried over the same interconnect that clients are using to the access database from."
Given that this is the current architecture until clustering comes along however, I have to work within these limits. If I stick to Redis as a cache with each entry on a short TTL however, the dataset should remain small. If I start to use it as a permanent datastore, then these replication costs will increase.
As for RAM usage, apart from the 3x the size of the database rule to allow for replication on the master, Redis will attempt to keep all of the database in RAM on all nodes so Linux will start to swap to disc once that's exhausted, which would kill performance. If I was just using Redis as a cache however, that would not be a problem as I can set a hard limit on the RAM usage and tell Redis to evict older stuff when it hits that limit:
"Alternatively can use the "maxmemory" option in the config file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands), or you can configure it to evict keys when the max memory limit is reached in the case you are using Redis for caching."4
Naturally you would only want to run Redis on a private network. Within that network however, I have many different projects and developers using that same resource, and sadly Redis provides no authentication system beyond a single global password from which to tell them apart.5 A single rogue client issuing a flushall command for example would wipe out the databases of all users of the service, not cool. My thoughts on this presently is that I will have to build an authentication system in front of Redis myself as part of my public API.
Redis supports a single master architecture, that you will write to and it will then replicate those writes to a number of slaves. If the master goes offline, you will want to failover to a new master that you will promote from the ranks of your existing slaves. This does not happen automatically. There are a number of options for this, and being a PHP programming my initial reflex is to do this using logic in my client API. A better option might be to use the Redis Sentinel tool6, but I have not evaluated that yet.
My favourite tool for monitoring is Monit7, and monitoring Redis with Monit is as straight forward as adding a new rule like so:
check process redis-server with pidfile "/var/run/redis.pid" start program = "service redis-server start" stop program = "service redis-server stop" if 2 restarts within 3 cycles then timeout if totalmem > 2048 Mb then alert if children > 255 for 5 cycles then stop if cpu usage > 95% for 3 cycles then restart if failed host 127.0.0.1 port 6379 then restart if 5 restarts within 5 cycles then timeout
I am sure that whatever tool you are currently using for monitoring will have an option for Redis to. As mentioned previously, there is also Redis Sentinel.
Backing up a Redis database is a file I/O operation, meaning you can use many existing tools to do this. According to the Redis documentation8:
"Redis is very data backup friendly since you can copy RDB files while the database is running: the RDB is never modified once produced, and while it gets produced it uses a temporary name and is renamed into its final destination atomically using rename(2) only when the new snapshot is complete."
There is a lot I like about Redis, so I do not want to seem negative in this article: it is my job to be critical while performing an evaluation.
For the caching use case, using Redis is an easy choice. In comparisons that I have read, Redis is just as fast as Memchache while it offers greater functionality.
For queuing, I am currently evaluating Resque9 which was developed by Github, and the PHP port of this php-resque.10 I really like what I see so far, and I think managing the PHP worker threads with Supervisor11 might be the winning solution for me.
Session storage in Redis is trivial once you have the PHP Redis module installed, just like with Memcache. All you need to do is add a few lines to your php.ini file like so:
session.save_handler = redis session.save_path = tcp://127.0.0.1:6379
Finally, we have long term storage. For me this is the problem child of the bunch, as I do not think that the replication model in Redis will support very large data sets. I know that replication lag will increase with larger data sets, as will the overhead of carrying out the replication. Until the clustered solution becomes production ready, I am not ready to use Redis as a full replacement for MySQL just yet.