James Blogs

最近由于一个频繁的日志记录操作,MySql数据库log表存了近1亿条数据,经分析决定删除8月25号之前的数据。以下是删除过程:

首先查询了一下25号之前的数据量:

select count(id) from log where create_at < '2018-08-25';
+-----------+
| count(id) |
+-----------+
|  18892469 |
+-----------+
1 row in set (46.61 sec)

然后使用limit + where条件分别尝试了几次删除1万到300万调数据,耗时在1分45秒到5分18秒之间,语句如下:

delete from log where create_at < '2018-08-25' order by create_at limit 10000;

因速度太慢,决定尝试使用自增主键作为条件:

# 先查询出分界点主键值
select max(id) from log where create_at < '2018-08-25';
+----------+
| max(id)  |
+----------+
| 32285777 |
+----------+
1 row in set (1 min 34.97 sec)

# 再根据id查询剩余数据,速度大幅度提升
select count(id) from log where id <= 32285777;
+-----------+
| count(id) |
+-----------+
|  14782469 |  # 因之前删过几次,所以数目和第一次查询不一样
+-----------+
1 row in set (12.87 sec)

然后采用id作为条件分批删除:

delete from log where id <= 32285777 limit 1000000;
Query OK, 1000000 rows affected (59.06 sec)

delete from log where id <= 32285777 limit 1000000;
Query OK, 1000000 rows affected (51.32 sec)

delete from log where id <= 32285777 limit 1000000;
Query OK, 1000000 rows affected (46.77 sec)

尝试了一下先排序再删:

delete from ctrip_pricing_log where id <= 32285777 order by id limit 1000000;
Query OK, 1000000 rows affected (59.04 sec)

耗时59.04秒,效果不如不排序。

得到测试结果后,写了个shell语句循环删除:

watch "mysql -h host -u user -p pwd database -e 'delete from log where id <= 32285777 limit 1000000;'"

因该表一直有数据写入所以采取了以上方案,如果是暂时不用的表可以参考一下这篇文章的方案。