The Problem with the In-Memory Trend
Let’s first start by establishing that the in-memory trend of data processing is great. It gives us very specific tools for handling very specific use cases. But, like all trends, we have to remember that it has advantages and disadvantages.
The idea behind in-memory processing is that we load up a dataset, we stream it through a process where everything is typically hosted in memory and then we either persist or don’t persist the answer on the other end. Realtime dashboards are a good use case, so is aggregation of data. But what about real-time dashboards or aggregations over clean, blended, governed data? The normal work pattern of the in-memory trend is that we load data onto a pipe and then we run some functions on the data, really fast. Why? Because there is no persistence. I hate to be the pessimist in the room, but anything that only runs
in memory will always be extremely fast. There is a very easy analogy to our human brains. Imagine if you could process huge amounts of input very quickly to get an answer e.g. You could read 20 books in a minute and then list all the characters to a friend. Then imagine that 5 seconds later, someone asked, what about female characters? You would have to read all the 20 books again because you didn’t persist anything to long storage (your brain).