MySQL database speed : Forums : PythonAnywhere

MySQL database speed

I was wondering if anyone else had been experiencing very long latency in database transactions, particularly over this weekend. I have an admittedly very database-intensive script that is taking several minutes to run; previously it had been running rather briskly. When I open the MySQL console directly, it seems to take quite a while to load, and latency between transactions runs to a couple of seconds each.

Has something changed recently, or is this just the best speed available? Incidentally, many thanks to PA crew for improving speed on the file manager!

geoffham | 55 posts | Dec. 9, 2012, 4:47 p.m. | permalink

Just for further clarification: "database-intensive" means traversing a DB of 80-odd rows, each with an average of 10 columns. With the "show tables" and fetching the column names for each row, we're looking at no more than 900 DB transactions, probably less. Run time seem to be between 1 - 2 minutes.

geoffham | 55 posts | Dec. 9, 2012, 6:16 p.m. | permalink

We had some database issues earlier that seem to have passed. I'll have a look tomorrow to see if I can work out what happened.

glenn | 10036 posts | PythonAnywhere staff | Dec. 9, 2012, 6:38 p.m. | permalink

Many thanks! It still seems quite slow for me, though. I just reloaded the Web-app on the off-chance that may have had something to do with it, but no.

geoffham | 55 posts | Dec. 9, 2012, 6:45 p.m. | permalink

It looks like one of our user databases was getting hit quite hard -- someone's doing some CPU-intensive stuff on it -- and that's leaking over to other users. We're working on getting some kind of a way to insulate everyone a bit better, but in the meantime we're upgrading the database -- this will happen at 10:30 UTC tomorrow.

giles | 12640 posts | PythonAnywhere staff | Dec. 10, 2012, 8:09 p.m. | permalink

Fantastic - thanks very much!

geoffham | 55 posts | Dec. 11, 2012, 1:35 a.m. | permalink

OK, it's upgraded now. Let us know if it helps!

giles | 12640 posts | PythonAnywhere staff | Dec. 11, 2012, 10:50 a.m. | permalink

Thanks very much - it's definitely a lot better, but still takes 20 seconds or so to run the 900-odd transactions. I'm not complaining overly (to be sure, my own code could be tightened up), but wanted to let you know it's still not lightening-fast. Maybe that's just as good as we can expect?

geoffham | 55 posts | Dec. 11, 2012, 10:41 p.m. | permalink

When you say 900 transactions, is that over the same connection or a new connection each time? In the latter case that's definitely something you'll want to fix as creating new connections is quite wasteful of CPU and network resources.

deleted-user-39880 | 669 posts | Dec. 12, 2012, 5:53 p.m. | permalink

Same connection every time. I've drilled down some - it looks the following statement is taking about 75ms to run each time: select * from information_schema.columns where table_name = "whatever". Since that runs 80 times, it appears to be the single cause of the 8 second run-time.

Any idea why that statement would run so slowly compared to other SELECTs? It basically just returns a list of between 5 and 25 column names.

geoffham | 55 posts | Dec. 13, 2012, 5:24 a.m. | permalink

have you tested to see what happens (time-wise), when you specify one or two column names?

rcs1000 | 311 posts | Dec. 13, 2012, 8:35 a.m. | permalink

Or it might be time to add an index to the table? Or caching? 75ms could be mainly network latency. Pesky light speed.

hansel | 459 posts | PythonAnywhere staff | Dec. 13, 2012, 11:58 a.m. | permalink

If network latency is the problem, you might find it's quickier to just do:

SELECT * FROM information_schema

... and do the filtering in Python. Of course, if there are an awful lot of rows then this would be more memory hungry. You could also use Python to build a single query of the form:

SELECT * FROM information_schema WHERE table_name IN ("whatever", "another", ...)

I realise that neither of these approaches would be regarded as ideal by an RDBMS purist, but they tend to worry more about throughput than latency, but that's not necessarily true in today's Internet (due to a lot of factors like bufferbloat).

Anyway, worth a try - not very hard to implement, and might get you better performance. Of course, hansel's suggestions about indicies and caching may also get you bigger wins, but reducing the number of queries can be a useful next step after them, or if they're not appropriate for some reason.

deleted-user-39880 | 669 posts | Dec. 13, 2012, 7:47 p.m. | permalink

It's definitely not network latency - there are literally a dozen select statements that run in between each of those information_schema requests that ran (in total) under 10 ms. In any case, I discovered that it's much easier to get a list of columns names using a "show columns in x" statement - these run in well under 10 ms. Thanks much for the thoughts!

geoffham | 55 posts | Dec. 14, 2012, 1:51 a.m. | permalink

Thanks for letting us know. Definitely an odd one -- there was a period over the weekend and Monday (Saturday 8th at around 9pm UTC until Tuesday 11th at 10am UTC) when the MySQL server you're connected to was very busy; I think someone else was running some heavy queries around then (and we need to insulate people better, any hints on how to do that in MySQL much appreciated :-).

But if it continued past then, it's definitely odd. Did your dataset size spike at around the same time period?

giles | 12640 posts | PythonAnywhere staff | Dec. 14, 2012, 12:06 p.m. | permalink

Nah - the dataset never exceeded 80 tables with no more than 10 columns in a single row each (I know, I know... there's a reason for that design, but it's not a great one). From my research, it seems as though MySQL just isn't very fast at information_schema requests. It's all solved up now, anyway.

geoffham | 55 posts | Dec. 14, 2012, 8:40 p.m. | permalink

OK, thanks for letting us know. I agree, MySQL can be pretty slow on anything apart from data requests -- running a query to see how large each DB on a server is takes a quite ridiculous amount of time, for example...

giles | 12640 posts | PythonAnywhere staff | Dec. 15, 2012, 7:29 p.m. | permalink

@geoffham: If you're willing to share, I'm quite curious why you need to make so many requests of the schema. Typically your application would define the schema and then simply rely on its existence. Are you writing some sort of ORM layer which needs to know?

(Of course if you're not willing to share I'll understand completely - just curious!)

deleted-user-39880 | 669 posts | Dec. 16, 2012, 8:50 p.m. | permalink

@Cartroo: Nothing so deliberate, I'm afraid. It's a function of my (re-)learning SQL at the same time as learning Python and designing a shopping cart system in a hacky way: each customer order has an arbitrary number of purchase items in it, so it seemed to me that the easiest way to go would be to design the DB so that each order is its own table with varying numbers of columns in each, with each column named after the random-number generated item number. The upshot of this is that any query to, say, "show me all the items in order x" involves having to fetch the name of the columns in order to make SELECTs on them to get the data.

I now see about a dozen better ways to implement a system like this, but I'm ploughing on for now until I feel compelled to rewrite the DB code to accommodate a better design. In any case, I'm not dealing with huge amounts of data or a requirement for high speed - but I was surprised at how slow it had been running using the information_schema requests.

geoffham | 55 posts | Dec. 17, 2012, 5:58 a.m. | permalink

Geoff, it sounds like you've come to this conclusion on your own, but just in case anyone passing by should see this thread, we'd like to recommend an alternative implementation.... Basically the idea is to try and avoid schema operations in production!

table CUSTOMER (id, last_name, first_name ...)
table PRODUCT (id, product_name, price...)
table ORDER (id, cust_id, date, ...)
table LINE_ITEM(id, order_id, product_id, quantity)

So an order is described by a combination of a parent ORDER item, and a variable number of LINE_ITEMs, which are linked to the ORDER by foreign key...

so, to generate the shopping cart, you would do a join between LINE_ITEM and PRODUCT, something like:

SELECT product_name, price, quantity 
FROM LINE_ITEM JOIN PRODUCT 
WHERE order_id=<whatever_order_id> 
AND product_id = PRODUCT.id;

Something like that anyway! My own mysql-joining-skills may be a little rusty...

harry | 2710 posts | PythonAnywhere staff | Dec. 17, 2012, 11:44 a.m. | permalink

Many thanks!

geoffham | 55 posts | Dec. 17, 2012, 2:51 p.m. | permalink

@geoffham: I totally understand where you're coming from on not wanting to do a rewrite (why break working code), but I have to agree with harry - I'd seriously recommend re-writing the schema sooner rather than later - RDBMSs in general aren't designed with frequent schema updates in mind, and even ignoring the current speed problems you may well be making more work for yourself tracking down obscure problems down the road.

As you see from the schema that harry posted above, the typical method is to design a set of tables each of which has a unique id column. When you need to build links of arbitrary degree between these tables, you use a separate mapping table, each row of which represents a single link. This is often referred to as a normalised schema, as it minimises duplication of information between tables.

So you'll often see a table specification like this:

CREATE TABLE customers (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    firstName VARCHAR(128) NOT NULL,
    lastName VARCHAR(128) NOT NULL,
    ...
    PRIMARY KEY (id)
) ENGINE=InnoDB;

The AUTO_INCREMENT on the id column means that you should never need to specify a value for it - each time you insert a row, a unique numeric ID is assigned to the id column for you. I don't want to get bogged down in details, but the PRIMARY KEY constraint makes it very efficient to look up rows via their id (which makes table joins faster) and also imposes a UNIQUE constraint (so each row has a unique value in the id column).

The schema for the orders table might be:

CREATE TABLE orders (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    customer BIGINT UNSIGNED NOT NULL,
    orderTime DATETIME NOT NULL,
    ...
    PRIMARY KEY (id),
    CONSTRAINT fk_customer FOREIGN KEY (customer) REFERENCES customers(id)
) ENGINE=InnoDB;

Note the foreign key constraint - this ensures that MySQL won't allow rows to be created which don't reference a valid customer. This is only possible because we're using the InnoDB engine, the old MyISAM engine doesn't support them. Foreign key constraints are generally a good idea to keep things consistent, but you do have to be careful (for example, MySQL won't let you remove a customer without first removing their orders - this is a Good Thing(tm), but something to bear in mind).

Similar, the lineItems table might be:

CREATE TABLE lineItems (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    order BIGINT UNSIGNED NOT NULL,
    product BIGINT UNSIGNED NOT NULL,
    quantity INT UNSIGNED NOT NULL,
    ...
    PRIMARY KEY (id),
    CONSTRAINT fk_order FOREIGN KEY (order) REFERENCES orders(id),
    CONSTRAINT fk_product FOREIGN KEY (product) REFERENCES products(id),
    CONSTRAINT unq_order_product UNIQUE (order, product)
) ENGINE=InnoDB;

Here we see some more foreign key constraints and also a UNIQUE constraint which ensures that each pair of order and product must be unique - so each order can only have one line item per product type. This is OK because the quantity column allows multiple instances, but might cause problems if you wanted to make the items different prices due to, for example, a buy-two-get-one-free offer. Personally I'd represent such offers in a separate table in the database (and as separate items on the invoice with effectively a negative price), but that's up to you. Anyway, I digress...

The JOIN that harry posted above is fine, but personally I'd be explicit about it being an INNER JOIN - it's easier to figure out what MySQL is doing the more specific you are. I'd also use the table names (or aliases for them) in the result columns, just in case two tables have the same column name. Further, I'd suggest using the ON clause for the JOIN instead of a WHERE because it's generally less ambiguous (see this SO answer for a more complete discussion than I could manage).

SELECT P.name, P.price, L.quantity
    FROM lineItems AS L INNER JOIN products AS P ON L.order=P.id
    WHERE L.order=?;

Hope that discussion was some help - feel free to ask about anything I mentioned if you're not sure about it.

Disclaimer: I'm not, and never have been, a DBA - my knowledge of MySQL is just what I've picked up over a few years of using it. Also, that was all from memory so forgive any syntax errors.

EDIT: Given that the context of this post was the speed of MySQL I should probably point out that the use of InnoDB, FOREIGN KEY and UNIQUE constraints will actually slow things down slightly - their purpose is to ensure consistency within the DB, not increase speed. The reworking of the SELECT statement and the addition of PRIMARY KEY directives, however, should provide some boost to performance. Depending on the queries to be executed it would probably be useful to add some additional secondary indicies to the tables, but this has to be done once the primary use cases are known. It's worth thinking about early, however, as once tables become large then adding an index can become an extremely slow operation which locks tables from use - I've had MySQL lock out a table for several hours adding an index before, admittedly on a table with tens of millions of rows. Of course this is only an issue for busy production sites, but it's always wise to plan for scalability.

deleted-user-39880 | 669 posts | Dec. 17, 2012, 3:50 p.m. | permalink

I just realized that I never thanked Cartroo for that great tutorial. I'm in the re-design phase now and I've had it open on my browser for days now. Thanks much!

geoffham | 55 posts | Jan. 28, 2013, 1:52 a.m. | permalink

You're quite welcome, glad it was useful.

deleted-user-39880 | 669 posts | Jan. 29, 2013, 1:40 p.m. | permalink