discuz论坛sessions表最终优化方案

作者 : admin 于 2010年01月17日, 23:00:29
2010
01-17

最近一直在折腾dz的sessions表优化。经过某群好友的各种方法提示和一些高数据量用户论坛的鼎力支持,总结以下优化方案。部分方案是在某些论坛正在使用的,部分方案是我发散思维总结的,没有经过大数据量和大负载下的应用,只是作为一个备选的方案,当然欢迎朋友们拿去实践。

1、分库

这个方法至少两三个注册用户百万级至千万级的论坛在使用。实施也比较简单,只需要把sessions表放在其他的库中,跟论坛主库分离,这样,就可以用多台服务器来分担论坛压力。sessions表查询的地方,如果直接查询,则连sessions表所在的库,如果是联查,则分别查询后,合并插叙结果。

2、砍功能

这个方法虽然不实用,但的确有效。仔细看看dz的一些sql语句,就知道砍掉某些功能摆脱sessions表的约束,性能会有多大的提高。砍功能,最终还是为了提升性能。但如果不砍功能又提升性能,才是终极目的。此方法适用于对某些统计功能要求不高的论坛使用。

3、memcached存储sessions,异步统计用户的在线数据

此方法其实还是砍掉了论坛的用户在线统计功能而独立开发一套统计系统。此思路来源于我们的统计服务器。如果统计服务器已经统计了部分信息,就没必要再去耗费大量的sql效率去进行统计。当时我们负责统计的吴同学正好也要开发一套针对不同系统的统计代码,我可以直接把要统计的数据提交给统计服务器,再由统计服务器返回统计结果。以异步的方式来统计,化解dz sessions表的性能瓶颈。而统计信息的异步提交,可以提交到本地数据库,可以提交到第三方统计系统,当然也可以根据需要独立开发一套小型统计系统。

方案3是本人原创方案,方案的修改代码在以下附件中: 异步session统计信息同步,欢迎大家使用。

由于dz的开发考虑的是大众市场,功能冗余代码逻辑复杂。但优化dz并不是那么的困难,而优化的思路,也是跟其他系统优化相似:mysql瓶颈。一个即将崩溃的系统,或者频繁崩溃的系统,是进行优化的最好示例:减少sql查询,提高sql语句查询效率。dz考虑大众市场,为了提高产品兼容性,并没有引入太多的第三方软件:比如memcached,bdb,ttserver等,虽然内部已经为主从预留了接口,但未见很明显的文档支持。
第三个方案优化的思路也很明显,减少sessions表的慢查询而改为memcached高速的数据存储,把统计功能做接口留出,此功能给第三方来做。这样sessions表的瓶颈即可消除。

针对尚趣网(vsuch.com)的简单优化

作者 : admin 于 2009年06月12日, 15:42:59
2009
06-12

最近,一友找我去优化apache,了解后,是vsuch.com网站的问题。

vsuch网站使用lamp+windows混合平台(汗),追究历史,原来网站用.net编写,后用php重构了整个网站。

整改前 Mysql运行在linux机器上,php运行在windows平台上,中间局域网方式连接。

网站日访问量不小,alexa排名6800,windows平台明显抗不住,经常莫名其妙的问题。(题外话:我最早维护的服务器也是windows,apache在上边很不稳定)

了解后,我感觉一台服务器就足矣,放弃了apache的方案,安装了nginx+php,数据库依然沿用原来的。

整改后,linux负载稍稍升高,mysql负载不变,http的负载在nginx下并没有表现出多高。顺利完成了迁移。

后公司又开通了cdn服务(有钱),速度有了很明显的提升。

-----------------------------------------

根据其公司目前的技术结构,我提出了很多优化和整改的想法,希望每个创业网站都能走好。

-----------------------------------------

singlekui@gmail.com: 孙. 谢谢你帮忙了. 我拿100元给买包烟抽抽. :)
我: ……
singlekui@gmail.com: 可以吗.也别介意呀.
我: 算了
就当玩了
singlekui@gmail.com: 啊..
说了给你点报酬的呢.
9:55 我: 就当玩了

做技术的,很多时候要学会一笑了之。

mysql索引优化

作者 : admin 于 2009年02月15日, 14:01:09
2009
02-15

做复杂的数据报表经常要很多sql语句连续执行,不是超时就是把mysql跑挂。不过现在硬件廉价,这些软件完美得兼容了多核多线程,虽然一个mysql进程负载高,但只占用一个内核,其他的查询依然可以使用其他的内核进行运算,服务器不会整体挂掉。

mysql优化,主要是索引,大量消耗资源的查询一定要做相关索引优化,具体优化思想我是阅读的这本书-http://www.douban.com/subject/3039216/ 里边mysql优化部分讲得非常简洁明了。而对于cpu的优化,就是尽力把整体时间拉开,cpu会比较平稳。具体做法是用sleep把sql语句分隔,这样用时间换取了稳定。当然,有些报表数据一旦作出就不会再变,这时最好把这些数据文本缓存,以后查询只读取文本即可。

累……疯狂的配置

作者 : admin 于 2009年01月18日, 22:16:05
2009
01-18

Dell1950终于上架了,又一个牛叉的系统,我把CentOS精简到了我能想象到的极致(当然跟gentoo还不能媲美),在AMP(apache,php,mysql)都启动空负载的状态下,内存占用降到了128M以下。

  1. top - 16:54:16 up  3:42,  1 user,  load average: 0.00, 0.00, 0.00
  2. Tasks:  83 total,   1 running,  82 sleeping,   0 stopped,   0 zombie
  3. Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 99.9% id,  0.1% wa,  0.0% hi,  0.0% si
  4. Mem:   4042344k total,   125040k used,  3917304k free,    11136k buffers
  5. Swap:  6094840k total,        0k used,  6094840k free,    47004k cached

一些内核级补丁没有打,估计了解这类漏洞入侵的人,我暂时也招架不住,现在只是为了服务器的稳定。apache+php+mysql+ftp+ssh之类的账户权限作了详细的统筹,对外服务权限尤其作了限制,尽量达到最优。PHP是用developer模式进行编译安装,没有考虑实际应用,只是把功能尽量多的增加(这些操作会稍微影响性能,但相对128M,是可以忽略的),但很重要的opcode缓冲加上了,Centos用最小安装,以来的一些组件和库分别安装,决不多余。尽量在硬盘基础占用上也降至最小。硬盘分区使用了LVM,偷懒了,虽然这样会有很大的问题,但手工分区还不是很熟悉,而且在管理存储上我功力还不够,以后进行改善。

硬盘情况如下:感觉还是大了点,回头清理一下缓存。

  1. [root@localhost ~]# df
  2. Filesystem           1K-blocks      Used Available Use% Mounted on
  3. /dev/mapper/VolGroup00-LogVol00
  4.                      134980848   1214972 126909172   1% /
  5. /dev/sda1               101086     14864     81003  16% /boot
  6. none                   2021172         0   2021172   0% /dev/shm

很成功的系统,但管理软件我一直没功夫去写,待写完后就是一个完整的初始系统+管理套件

传说中的千万数据量优化

作者 : admin 于 2008年12月30日, 22:10:51
2008
12-30

平时看别人招聘,动不动来个“有千万数据量优化经验”。我还的确没有这个经验,中小网站来个几万几十万的数据就了不得,一些涉及数据多的顶多也来个百万,因为超百万我就分表了,而且设计的时候尽量使用主键索引,避免使用模糊搜索和一些复杂的搜索。
基本百万数量级以下的简单应用,一般的服务器没什么问题,只要程序够不垃圾就行。
今天突然奇想,尝试下千万数据什么样子,花半个小时生成了一堆随机的数字,一千万整,当然还有其他的一些数据,主要测试从千万级的数据中捞少量数据是什么样子。
我用的MSSQL2005,使用存储过程生成数据,半个小时,Mysql没实验过。结果,从千万数据用where捞几万条数据,用单一的where条件,果然很慢,能达到20几秒。后仔细查看索引和语句,发现没有很好的索引供sql语句使用,根据where语句建立索引,速度提升到10秒左右。因为只是取出几个字段,后又想使用覆盖索引,加上那几个字段,速度大为提高。
经过跟踪测试,在覆盖索引的使用后,取数据已经达到了相当迅速,但大量的时间消耗在了数据库连接和数据的传输上。因为本系统必须操作大量数据,而拆分数据后又增加了创建数据库连接的成本,速度反而变慢。其实,只要php不超时,已经达到了可用的程度。
总结:索引这玩意真的不错,但设计数据表要尽量简单,简单不是指字段简单,而是使sql语句取数据简单,必要时中间要增加矢量的冗余数据,用空间换取时间。

40 Tips for optimizing your php code

作者 : admin 于 2008年08月05日, 10:22:25
2008
08-5
  1. If a method can be static, declare it static. Speed improvement is by a factor of 4.
  2. echo is faster than print.
  3. Use echo’s multiple parameters instead of string concatenation.
  4. Set the maxvalue for your for-loops before and not in the loop.
  5. Unset your variables to free memory, especially large arrays.
  6. Avoid magic like __get, __set, __autoload
  7. require_once() is expensive
  8. Use full paths in includes and requires, less time spent on resolving the OS paths.
  9. If you need to find out the time when the script started executing, $_SERVER[’REQUEST_TIME’] is preferred to time()
  10. See if you can use strncasecmp, strpbrk and stripos instead of regex
  11. str_replace is faster than preg_replace, but strtr is faster than str_replace by a factor of 4
  12. If the function, such as string replacement function, accepts both arrays and single characters as arguments, and if your argument list is not too long, consider writing a few redundant replacement statements, passing one character at a time, instead of one line of code that accepts arrays as search and replace arguments.
  13. It’s better to use switch statements than multi if, else if, statements.
  14. Error suppression with @ is very slow.
  15. Turn on apache’s mod_deflate
  16. Close your database connections when you’re done with them
  17. $row[’id’] is 7 times faster than $row[id]
  18. Error messages are expensive
  19. Do not use functions inside of for loop, such as for ($x=0; $x < count($array); $x) The count() function gets called each time.
  20. Incrementing a local variable in a method is the fastest. Nearly the same as calling a local variable in a function.
  21. Incrementing a global variable is 2 times slow than a local var.
  22. Incrementing an object property (eg. $this->prop++) is 3 times slower than a local variable.
  23. Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
  24. Just declaring a global variable without using it in a function also slows things down (by about the same amount as incrementing a local var). PHP probably does a check to see if the global exists.
  25. Method invocation appears to be independent of the number of methods defined in the class because I added 10 more methods to the test class (before and after the test method) with no change in performance.
  26. Methods in derived classes run faster than ones defined in the base class.
  27. A function call with one parameter and an empty function body takes about the same time as doing 7-8 $localvar++ operations. A similar method call is of course about 15 $localvar++ operations.
  28. Surrounding your string by ‘ instead of ” will make things interpret a little faster since php looks for variables inside “…” but not inside ‘…’. Of course you can only do this when you don’t need to have variables in the string.
  29. When echoing strings it’s faster to separate them by comma instead of dot. Note: This only works with echo, which is a function that can take several strings as arguments.
  30. A PHP script will be served at least 2-10 times slower than a static HTML page by Apache. Try to use more static HTML pages and fewer scripts.
  31. Your PHP scripts are recompiled every time unless the scripts are cached. Install a PHP caching product to typically increase performance by 25-100% by removing compile times.
  32. Cache as much as possible. Use memcached – memcached is a high-performance memory object caching system intended to speed up dynamic web applications by alleviating database load. OP code caches are useful so that your script does not have to be compiled on every request
  33. When working with strings and you need to check that the string is either of a certain length you’d understandably would want to use the strlen() function. This function is pretty quick since it’s operation does not perform any calculation but merely return the already known length of a string available in the zval structure (internal C struct used to store variables in PHP). However because strlen() is a function it is still somewhat slow because the function call requires several operations such as lowercase & hashtable lookup followed by the execution of said function. In some instance you can improve the speed of your code by using an isset() trick.Ex.if (strlen($foo) < 5) { echo “Foo is too short”; }

    vs.

    if (!isset($foo{5})) { echo “Foo is too short”; }

    Calling isset() happens to be faster then strlen() because unlike strlen(), isset() is a language construct and not a function meaning that it’s execution does not require function lookups and lowercase. This means you have virtually no overhead on top of the actual code that determines the string’s length.

  34. When incrementing or decrementing the value of the variable $i++ happens to be a tad slower then ++$i. This is something PHP specific and does not apply to other languages, so don’t go modifying your C or Java code thinking it’ll suddenly become faster, it won’t. ++$i happens to be faster in PHP because instead of 4 opcodes used for $i++ you only need 3. Post incrementation actually causes in the creation of a temporary var that is then incremented. While pre-incrementation increases the original value directly. This is one of the optimization that opcode optimized like Zend’s PHP optimizer. It is a still a good idea to keep in mind since not all opcode optimizers perform this optimization and there are plenty of ISPs and servers running without an opcode optimizer.
  35. Not everything has to be OOP, often it is too much overhead, each method and object call consumes a lot of memory.
  36. Do not implement every data structure as a class, arrays are useful, too
  37. Don’t split methods too much, think, which code you will really re-use
  38. You can always split the code of a method later, when needed
  39. Make use of the countless predefined functions
  40. If you have very time consuming functions in your code, consider writing them as C extensions
  41. Profile your code. A profiler shows you, which parts of your code consumes how many time. The Xdebug debugger already contains a profiler. Profiling shows you the bottlenecks in overview
  42. mod_gzip which is available as an Apache module compresses your data on the fly and can reduce the data to transfer up to 80%
  43. Excellent Article about optimizing php by John Lim