缓存大数组:用JSON, serialize or var_export?

发布:2013-09-06 12:42   点击1077次   评论:0

老外的文章,很好!
http://techblog.procurios.nl/k/618/news/view/34972/14863/


Cache a large array: JSON, serialize or var_export?

Monday 06 July 2009While developing software like our framework you will need to cache a large data array to a file at some point sooner or later. At such a point you need to choose what caching method you will be using. In this article I will compare three methods: JSON, serialization and var_export() combined with include().

JSON

The JSON method uses the json_encode and json_decode functions. The JSON-encoded data is stored as is into a plain text file.

Code example

  1. // Store cache  
  2. file_put_contents($cachePath, json_encode($myDataArray));  
  3. // Retrieve cache  
  4. $myDataArray = json_decode(file_get_contents($cachePath));  

pros

  • Pretty easy to read when encoded
  • Can easily be used outside a PHP application

cons

  • Only works with UTF-8 encoded data
  • Will not work with objects other than instances of the stdClass class.

Serialization

The serialization method uses the serialize and unserialize functions. The serialized data is, just like the JSON data, stored as is into a plain text file.

Code example

  1. // Store cache  
  2. file_put_contents($cachePath, serialize($myDataArray));  
  3. // Retrieve cache  
  4. $myDataArray = unserialize(file_get_contents($cachePath));  

pros

  • Does not need the data to be UTF-8 encoded
  • Works with instances of classes other than the stdClass class.

cons

  • Nearly impossible to read when encoded
  • Can not be used outside of a PHP application, without having to write custom functions

Var_export

This method 'encodes' the data using var_export and loads the data using the includestatement (no need for file_get_contents!). The encoded data needs to be in a valid PHP file so we wrap the encoded data in the following PHP code:

  1. <!--?php
  2. return /*var_export output goes here*/;

Code example

  1. // Store cache
  2. file_put_contents($cachePath, "<!--?php\nreturn " . var_export($myDataArray, true) . ";");
  3. // Retrieve cache
  4. $myDataArray = include($cachePath);

pros

  • No need for UTF-8 encoding
  • Is very readable (assuming you can read PHP code)
  • Retrieving the cache uses one language construct instead of two functions
  • When using an opcode cache your cache file will be stored in the opcode cache. (This is actually a disadvantage, see the cons list).

cons

  • Needs PHP wrapper code.
  • Can not encode Objects of classes missing the __set_state method.
  • When using an opcode cache your cache file will be stored in the opcode cache. If you do not need a persistant cache this is useless, most opcode caches support storing values in the shared memory. If you don't mind storing the cache in memory, use the shared memory without writing the cache to disk first.
  • Another disadvantage is that your stored file has to be valid PHP. If it contains a parse error (which could happen when your script crashes while writing the cache) your application will not work anymore.

Benchmark

In my benchmark I used 5 different data sets with different sizes (measured in memory usage): 904B, ~18kB, ~250kB, ~4.5MB and ~72.5MB. For each of these data sets I did the following routine for each encoding method:

  1. Encode the data 10 times
  2. Calculate the string length of the encoded data
  3. Decode the encoded data 10 times

Results

Yay, results! In the result tables you see the length of the encoded string, the total time used for encoding and the total time used for decoding. The benchmark was done on my laptop: 2.53GHz, 4GB, Ubuntu linux, PHP 5.3.0RC4.

904 B array
JSON Serialization var_export / include
Length 105 150 151
Encoding 0.0000660419464111 0.00004696846008301 0.00014996528625488
Decoding 0.0011160373687744 0.00092697143554688 0.0010221004486084
18.07 kB array JSON Serialization var_export / include
Length 1965 2790 3103
Encoding 0.0005040168762207 0.00035905838012695 0.001352071762085
Decoding 0.0017290115356445 0.0011298656463623 0.0056741237640381
290.59 kB array JSON Serialization var_export / include
Length 31725 45030 58015
Encoding 0.0076849460601807 0.0057480335235596 0.02099609375
Decoding 0.014955997467041 0.010177850723267 0.030472993850708
4.54 MB array JSON Serialization var_export / include
Length 507885 720870 1059487
Encoding 0.13873195648193 0.11841702461243 0.38376498222351
Decoding 0.29870986938477 0.21590781211853 0.53850317001343
72.67 MB array JSON Serialization var_export / include
Length 8126445 11534310 19049119
Encoding 2.3055040836334 2.7609040737152 6.2211949825287
Decoding 4.5191099643707 8.351490020752 8.7873070240021

We've done the same benchmark on eight other machines including Windows and Mac OS machines and some webservers running Debian. Some of these machines had PHP 5.2.9 installed, others already switched to 5.3.0. All had the same (relative) results, except for a macbook in which serialize was faster encoding the largest dataset.

Conclusion

As you can see the var_export (without opcode cache!) method doesn't come out that well and serialize seems to be the overall winner. What bothered me though was the largest dataset in which JSON became faster than serialize. Wondering whether this was a glitch or a trend I fired up my OpenOffice spreadsheet and created some charts:

愲CoN1214;SON, serialize or var_export?位(bit)。
将这128位的地址按每16位划分为一个段,将每个段转换成十六进制数字,并用冒号隔开。
例如:2000:0000:0000:0000:0001:2345:6789:abcd
这个地址很长,可以用两种方法对这个地址进行压缩,
前导零压缩法:
将每一段的前导零省略,但是每一段都至少应该有一个数字
例如:2000:0:0:0:1:2345:6789:abcd
双冒号法:
如果一个以冒号十六进制数表示法表示的IPv6地址中,如果几个连续的段值都是0,那么这些0可以简记为::。每个地址中只能有一个::。
例如:2000::1:2345:6789:abcd
 
单播地址(Unicast IPv6 Addresses)

可聚合的全球单播地址(Aggregatable Global Unicast Addresses)
可在全球范围内路由和到达的,相当于IPv4里面的global addresses。前三个bit是001
例如:2000::1:2345:6789:abcd
 
链路本地地址(Link-Local Addresses)
用于同一个链路上的相邻节点之间通信,相当于IPv4里面的169.254.0.0/16地址。Ipv6的路由器不会转发链路本地地址的数据包。前10个bit是1111 1110 10,由于最后是64bit的interface ID,所以它的前缀总是FE80::/64
例如:FE80::1
 
站点本地地址(Site-Local Addresses)
对于无法访问internet的本地网络,可以使用站点本地地址,

The charts show the relative speed of each method compared to the fastest method (so 100% is the best a method can do). As you can see both JSON and var_export become relatively faster when the data set gets big (arrays of 70MB and bigger? Maybe you should reconsider the structure of your data set :)). So when using a sane sized data array: use serialize. When you want to go crazy with large data sets: use anything you like, disk i/o will become your bottleneck.


-->
Catalog/类别
 

关于 GitHub 导航 部门 反馈

提示:`/home.php`入口数据仅为演示功能,不构成任何交易凭证,也不承担相关风险和责任!

Copyright © 2011-2018 xxxxx.com All rights reserved.

Run:6.567/51.652(ms); 7(sql)/2.972(MB); comm:news/detail; Upd:2025-01-11 08:18:22