Social Question

Ranimi23's avatar

If I am going to have a data table with million records and every record is going to be update every 30 seconds. Where should I save all this table data?

Asked by Ranimi23 (1914points) January 25th, 2010

I have a problem in data base. I am using MySQL to save important data as needed, but I also have to save data that is not important to save, just need to use it for my purpose.

Where should I save this data?

I think saving it in DB is a bad idea, because of the dynamic update of records every few seconds. I am thinking about 1 million users who update this table every few seconds. It probably going to kill my DB.

What do you think?

Observing members: 0 Composing members: 0

12 Answers

CyanoticWasp's avatar

What I think is that you’d better be talking to professionals who have a stake in the answer they give, rather than a bunch of random strangers on the Interwebz.

HasntBeen's avatar

I would say “that design isn’t very practical”. If you have a million records being updated that rapidly, the only reasonable design is to keep them in memory. So you’ll need a lot of that, possibly a 64-bit machine if the records are large.

My “secret” opinion (which I will share with you to descretize it) is that there’s a design flaw. If you’re updating records at that rate, are they really changing at that rate? In general, you want to reduce update frequencies on large-volume data, and only deal with differences. Without knowing more, I can’t advise more, that’s just a suspicion.

Ranimi23's avatar

Hi @HasntBeen , 10x for the help.
Do you think it will be a good idea to use Chaching for saving that kind of table that updating and being read a lot ?

grumpyfish's avatar

So, you’re updating 1M records every 30 seconds, that’s only 33k records per second. You can play with the Benchmarking Suite to see how your current setup is performing, which will tell you where you need to improve.

One question is if you need to do this updating live? E.g., if I’m just doing a LOT of data collection, it may be better to dump all of the writes into a file and process them offline.

The other question is data segmentation, if you can split that into (10) groups of 100k updates on 30 second cycles, now you’ve got 3k updates per second. It’ll require 10 DB servers, and some sort of master-keeper to tell the servers where other records they’re requesting are, but it’s a good way to spread things out a bit more.

And yeah, I agree with @HasntBeen that you might have a design flaw if you think records are going to be being updated that fast. That is, if you have a million users just entering data, you can queue up the updates (as they don’t need to be in the database right away).

The other thing (as @CyanoticWasp said) is to hire a dba to design the system—once you get above 10^4 users, things can fall apart REALLY fast if you don’t have good design.

funkdaddy's avatar

You mention “I also have to save data that is not important to save, just need to use it for my purpose.”

That sounds like it may be a good opportunity to simply use the session data storage unless I’m mistaken. It can come with its own set of problems but if you’re just storing a few key pieces of info for use later it sounds like a better solution than writing/reading that information from the database each time it’s needed.

What sorts of things are you storing? Maybe an example of the “not important” data would shed some light on possible solutions others have used successfully in the past.

Ranimi23's avatar

Hi, 10x all for all the help.

Let me tell you what I need to implement on my company .NET application:
For every user loged-in to the company website, We want to know on which page he is right now. It means, If user X is now on page “1.html”, I need to save this information somewhere. If user X after 30 seconds going to page “2.html” I want to update it fast where he is now surfing.

I gave the 1 million users just for the worst case. If some day my company website has 1 million user log-in, I am going to save to every one of them where he is surfing right now. Maybe I exaggerated, but we do have 45,000 users at the same time on the company web site and the number is growing. I think big so the solution for this problem will good for a lot of users.

So, my data object for every user should keep this information:
1. User code (User Number).
2. Which page user is now observing (Page Number).

This object is going to update every 30–60 seconds, so it should be updated fast. It is a lot of reading/writing data. If the Admin want to see which users are now on page “3.html” he will be able to read it, just by going through the whole array of objects.

I don’t need this information to be saved in DB, because it doesn’t mean anything. It is temporary and changing A LOT. So, I think the right place to do it is using RAM at the the company server, use the Caching Data option.

What do you think?
Am I wrong?

grumpyfish's avatar

@Ranimi23 Well, if all you need to know is what page people are currently looking at, couldn’t you just query the application every time you needed the info, rather than having the application always writing out the info?

Ranimi23's avatar

@grumpyfish hi, What do you mean Query? I need to store somewhere the info in order to use read it when needed?

jaguarcy's avatar

Are you just saving the current page each user is looking at? I would assume storing every page a user is looking at with a timestamp would be more useful, because then you’d be able to actually trace all user’s path through your site and analyze exactly what they did.

In this case, I’d say just keep all the data in-memory as a flat list. No updating necessary ever, the list just grows, and then have an offline task like a cron job that periodically takes this in-memory data and dump all the data to the disk. You could do some simple cache rotation in-memory for the dump, and you’ll most definitely need to do file rotation on the disk every week (or day depending on your traffic) so the file doesn’t get too big – there’s automated tasks that will do this based on file size or age.

In my experience, this will flat-out beat ANY other approach for speed. Maybe not convenience for analyzing the data later, but it won’t impact your site’s performance.

Ranimi23's avatar

@jaguarcy , What do you mean a timestamp?

grumpyfish's avatar

@Ranimi23 What he’s proposing is that you just dump into a file (or into memory, then ocassionally writing to a file) something like this:

1264596777 User237 1.html
1264596803 User221 1.html
1264596823 User237 2.html

etc. etc. etc. Where the big number at the beginning is some sort of timestamp (here, unix time—where 1264596777 = around 7:55AM this morning)

Then to find out what page User237 is one, you search from the bottom up, and it’ll tell you where they were. (It’ll also tell you how long they were on page 1 before going to page 2).

Actually, your httpd server probably does this already, you just have to turn it on.

Answer this question




to answer.
Your answer will be saved while you login or join.

Have a question? Ask Fluther!

What do you know more about?
Knowledge Networking @ Fluther