30 August 2011

Home network: Monolith vs beehive


The first choice I am faced with is: a lot of small specialized devices, one large device or something in between.

One large server

Having one large server is a viable solution, albeit a classic one. A single 8 disk server running in RAID0 has more than enough storage space. It can serve as a SSH server and VPN server. A large server is also cost effective and easily(ish) upgradeable.

However, there are a few problems with one colossal server: power usage, noise and coolness factor.

A large machine requires lots of power. Now, granted, a lot of small machines add up to lots of power, however small machines are much simpler to switch off when not needed. Even though you can spin down drives in a server and clock down CPU it still costs you SOME power. And even if you do spin down drives and clock down CPU the PSU is still rated for 500W or so, so operating at 40W is not exactly efficient.

Large classic server also has a bit more moving parts and a larger box. This affects the noise both ways. You can suspend drives with elastic to reduce noise, pad out the box, use larger fans that spin on lower RPM and put in other sound improvements. What you can't do, is hide it. It's a big box that has to be connected to your TV and sound system. Of course you can drag cables or try wifi, but those are either ugly, require construction work or cost a lot. And that is if you have somewhere to hide a box of that size, which I don't really have.

Also, it's much more cool to build a large network of small machines. Much more SF :D

How would one large server look like?

For comparison I decided to look into a large server assembly as well. First off, some performance requirements.

Storage

I am looking at around 100GB of 3x replicated storage, 1TB of 2x replicated storage and around 2TB of non replicated storage. On top of that, some small amount of storage is used up by the OS, let's say 10GB SSD.
To get replication I can use RAID or simply make backup of everything on different drives. RAID is good because it comes with performance improvements, is mostly hassle free and is considered the way to go.
I however, do not think RAID is appropriate for home networks. In a typical home network, instantaneous replication is not really all that important. In fact, sometimes it's not wanted. If, for example, I would accidentally delete a folder RAID would be of no use to me at all. On the other hand, if I made automated backups to another disk every 6 hours, I would have at most 6 hours to rescue my data. Of course this means I don't have the performance benefits or RAID, and I have to take care not to ware out one drive more than other, but that can be coded away mostly. And performance is not such a big issue, once the initial copying is done.
Keeping that in mind, my storage needs can be satisfied by 3 2TB drives. Currently one of the best on the market are WD Caviar Green disks. They are low power, low noise and cost effective.

A WD Caviar Green 2TB drive is around 6.5W.

CPU/Motherboard

The most CPU intensive action for this machine would be HD video encoding/decoding. Today, that is not such a big thing. Both Intel and AMD offer "standard" and low power platforms. After going through the roundups at http://www.silentpcreview.com/, a site with lots of great reviews for silent builds, I picked out:
  • Undervolted Intel Core i5 2400 as it has the best performance per watt and is midrange in performance per dollar.
  • Intel Core i3 2100T has the best absolute power usage, rated at 35W
Both of these would easily power all my needs. Of course, a motherboard is needed as well, Asus P8P67 would do. Or any of that lot. Intel motherboards are said to be the most energy efficient ones. Intel DH57JG (not the correct socket for i3 2100 but....) is one of the better ones and puts CPU+MB at around 75W.

GPU and other

Is not needed. The integrated GPU in Intel chips is good enough for video playback. A stand alone GPU could easily use 100W or more, and would be the most expensive part in the whole box.


Total

So in total we have worst case power consumption at (75 + 3*6.5) = 94.5W. Truth be told, the drives will be idle most of the time, so will the CPU and MB so the number is somewhere in the 40-50W area. However, we do need a good PSU. Rated at 100W, at least. And since those are actually not so easy to find and can be costly, we probably end up with something along the lines of Enermax PRO II 425W, so 40W will actually be quite outside the optimal operation area.

With the modular design I am aiming at 10W while "idle", with a max of 54W on full usage. Also, perfect setup should be with 0 moving parts when the disks are spun down.

Now, I actually already run one such installation with a single large box. I will measure in full detail the power consumption on it and compare it to the eventual beehive design I plan to put up in my apartment. So at the end, there will be some real data here as well.

28 August 2011

Home Network: Initial requirements analysis

Introduction

I decided to pimp up my home network, and this time do it properly. It turns out, a lot of analysis goes into doing so, so I decided to keep it as a series of blogposts. This is mostly going to serve me as a central place for all my plans, but if someone finds my results usable, even better.

So the first step is to list everything I would like my network to do.

Use cases

The following is a list of use cases. You might notice that they are broken down into small, manageable chunks:
  1. Data storage - photo
  2. Data storage - audio
  3. Data storage - video
  4. Data storage - documents
  5. Music player
  6. Video player
  7. Permanent SSH server (DynDNS)
  8. VPN server
Some details for each case:

Data storage - Photos, Audio, Video, Documents

You might wonder why data storage was broken down in 4 components. The reason is lifetime. Storing photos is for life, or longer. I want to store all my photos, have automated backup system within the network, health checks and also manually assisted remote backup either to the cloud (Picassa proved usable so far) or to some other network under my control. This requires a paranoid level setup. Luckily, photos are usually on the order of 1-10MB size. Worst case for me are RAW 14.1MB photos.

Storing audio files is for decades. Some people like to imagine large collections of music. Perfectly sorted, tagged and ready for usage. I am one of them. Naturally, it takes quite a lot of time and energy to create such collections. Therefore it's a long lasting project. Years. Maybe decade or two. So I want my audio storage to be backed up within the network as well. File sizes are similar to photos for compressed, and a bit larger for lossless files.

Documents are also for decades. However, they are usually quite small. I can safely asume an average of 1MB per document.

Videos on the other hand, do not need such extensive backup and safety measures. They are however much much much larger.

Music & Video player

All that data would be pointless if unused. A small HTPC, media center or some other form of music & video player is needed. HD is probably required.

SSH server

SSH server is very important for two reasons:
  1. It allows me to use the network even when outside
  2. It allows me to grant access to the network to other people
I have a reasonable cable internet connection so bandwidth should not be a problem. Latency might be, but only in peak hours, and even then only for latency sensitive tasks such as editing files.

VPN server

Since I plan to have this kickass setup, I might as well use it to secure myself when I am out of the safety of the network. A VPN server would do wonders here.

Other stuff

Aside from what I would like the network to enable me to do, there are some other things I would like it to be. On top of the list is SILENT. I wan't the devices to be as silent as possible. What I am looking for is in the dark of the night, 3am, to be able to be in the same room with them and not notice anything. Luckily, this goes hand in hand with the next requirement: ultra low power. Some of the parts are going to be operational 24/7 and I do not want them to draw more power than is absolutely necessary. Next in line is cost. I understand that what I am going for is not the cheapest option out there, but I'm not ready to pay for overpriced hardware either. In the end, size matters. And visual design.

26 March 2011

Research results: Can JavaScipt getTime() be used for measuring performance of JavaScript code?

Recently, during research for my masters thesis I had to answer the question "Can we use JavaScirpt getTime() function in our benchmarks and get portable benchmarks that are valid?".

For the tl;dr crowd here is a short answer: Yes on Linux (and OS X probably) - not really on Windows.

After some research on the Internet, I found this blog post from 2008. by John Resig that explores this issue. He concludes that getTime() is basically useless on Windows XP irregardless of browser choice, while being safe to use it on OS X. I wanted to update his results to 2011 and explore the issue a bit more.

My goal was to measure the precision and not the accuracy (the difference is nicely explained in this Wikipedia article) of the getTime() function. We could do some analysis of accuracy, but opt not to because it's not so easy to setup a good experiment for accuracy. Precision on the other hand is nicely quantified in terms of the standard deviation of the experiment. Smaller standard deviation is better. The best possible outcome would be for the results to follow a very narrow Gaussian distribution.

The setup

The code for most benchmarks in JavaScript can be summarized in these three lines:
var start = (new Date).getTime()
doStuff()
var diff = (new Date).getTime() - start

so this is the scenario I decided to test. The obvious question of course is what to use as a sample load. It's important for the test load to have deterministic duration that can be easily controlled, and is too complicated for the browser to simply optimize away. A nice way to do this is to have a O(n^2) math function that does some of the slowest operations, such as the modulus operation. I used the following code:
    function sample(size) {
var result = 0;
for (var i = 0; i < size; i++) {
innerSample(i, result);
}
}

function innerSample(i, result) {
result = i * i + i + (0x09F91102 % i) / 2;
for (var j = 0; j < i; j++)
result = (result * result) % (0x09F91102);
return result;
}

The reason it's split in two is because I had a pre calibration step first. Since we are not interested in comparing relative browser speeds we do not need to use the same number of iterations of the sample function on all browsers. What would be much better is to have the same duration on all browsers. This is why, before testing, I run a pre calibration step that determines the number of iterations I should use so that the average duration of one sample is a bit under 75ms. Since we are not measuring accuracy a rough estimate will do just fine, as long as the duration is long enough so as to allow for anomalous shorter run times.

This completes the setup.

The experiment

The experiment is performed by looping 500 times over the sample code with the pre calibrated number of steps. After that the standard deviation is calculated. This is repeated a few times if possible to see if the results are stable.

The results

I tested Firefox 3.6 and 4.0 and Chrome 9 and 10 on Linux (Arch Linux 2.6.37 kernel), Windows XP fully patched and Windows 7 fully patched. I did run the Linux test on four machines: an Intel i3 desktop, two different Core2Duo laptops and Atom netbook and the results did not differ significantly. This shows that the hardware itself does not influence the tests. As for the Windows tests, I don't even own a Windows box so I used volunteer help for those tests (if you were one of these unfortunate "volunteers" I am so so sorry, I promise not to spam you all again). They were run on a wide variety of machines. There were significant variations on Windows machines, these are analysed below.

Mac OS X is omitted because I do not have access to a Mac OS X machine at the moment. In future the results will be appended. I also plan to include Opera and Safari. What will never be included is Internet Explorer, for two reasons: the script runtime limit in Firefox and Chrome is based on actual time whereas IE limits the number of operations you can execute between events (MSDN site sets the limit as ~5.000.000 operations) and, more importantly, you can switch OFF the runtime limit in Firefox and Chrome.

The results were pretty much consistent with the previous results from 2008. Here is a summary table of standard deviations:


Linux Windows XP Windows 7
Firefox 3.6 0.5ms
6.8ms [1.2ms, 7.3ms]
Firefox 4.0 0.4ms 37.4ms [3.6ms, 7.9ms]
Chrome 9 1.2ms 12.7ms [4.7ms, 17.3ms]
Chrome 10 1.7ms 33.8ms [4.9ms, 19.7ms]

As you can see from the results, Firefox on Linux is the most precise with only 0.4 ms of deviation on 75ms test load. Chrome works OK on Linux too. It's worse than Firefox but still excellent. The results on Linux were stable across many different machines and are quite good. I did not test it against older kernels, but I don't think this is a major problem since all major Linux distributions do have up-to-date kernels.

This is a normalized graph of all 4 browsers on Linux. The X-axis values are omitted on purpose since each series is normalized so that the smallest value is 10. Since we are not interested in accuracy this gives a nicer and equally valid image:



As you can see the results are neatly packed with more than 95% of results packed in a 2ms range.

The results on Windows were much more inconclusive. You will notice that the results for Windows 7 are given as intervals. This is because the results varied wildly between different machines. The results were quite stable on any given machine, so it's not an artifact of the experiment itself. If you look at the best results, they are larger than on Linux, but still usable: 4.9 ms is not that bad. It's an order of magnitude bigger than 0.5, but still usable. In my personal opinion (since extensive tests should be done for a stronger result) good results on Windows were rare. Only 2 out of 10 Windows 7 machines produced good results. A typical distribution for a Windows machine is displayed below:



This chart was produced on Chrome 10 tested on Windows 7. Interesting observation is that the first few measurements were on the 75ms mark, but the execution slowed down afterwards moving the median value up to 125ms. This slow down was not accounted for or explained. It's left as an open question.

The conclusion

Using getTime() on Linux in the manner described here is VALID on both browsers. Using getTime() on Windows will give some results, but you can never be sure that another user will be able to repeat them on his machine.

The future

These results are a good start but there is much room for expansion. If time permits I plan to answer some of these questions in the future:
  • How does the standard deviation behave on longer loads, say 1000ms?
  • What are the possible factors affecting precision on Windows?
  • How do other browser place?
  • Do Windows results follow a pattern? Can some results be discarded and attributed to "start up" problems?
  • Test on more machines
The update

The code used can be found on GitHub https://github.com/ivankovic/cccm-benchmark. There is also a GitHub Page that can be used at once http://ivankovic.github.com/cccm-benchmark/

One possible reason for Windows is that the precision of the GetSystemTime system call is in the 10ms - 16ms range. This however does not explain why sometimes the results of Windows were good. Or how to deal with benchmarks on Windows.

09 February 2011

Science in pursuit of lazyness: Cleaning a silver ring in 60 seconds

I'm gonna go ahead and start with this: Following this advice will make you put metal inside a microwave. You probably should not be following this unless you can understand the full implications of that.

What is the best way to clean a silver ring with cubic zirconia ornaments? I have no idea.

There is however a lazy way to do it under 60 seconds. For this you will need:
  1. The ring
  2. Citric acid (the household variety, it's dirt cheap)
  3. Small glass
  4. Water
  5. Microwave oven
Put enough water in the glass so that you can fully submerse the ring. You might want to leave at least some space between the water level and glass edge, because boiling could happen and cleaning the microwave would be very un-lazy.

Pour citric acid in the water. Make a fine solution. I have no idea how much I used, but I'm going to guess it as around 5-10%. Drop the ring in the acid. Put all of that in the microwave. Let her rip with 700W on 30 seconds, or until everything starts boiling and you think it would be a good idea to pull it out.

Carefully take the glass out, extract the ring, cool it, wipe it dry. Enjoy the no effort sparkliness.

Things to note:

Too much water is better than too little watter.

Citric acid is, well, acidic. Also, HOT citric acid will burn you. Use some gloves man!

The ring is made of metal, and metal in microwave is not a good idea. However, there should be no danger. In fact, we are counting on the fact that the metal will act as an antenna, causing it to absorb microwaves which in turn will heat up the metal much faster than the water around it causing:
  1. The acidic reaction around the ring to speed up
  2. Formation of water vapour bubbles on the ring surface that will physically remove dirt particles of the ring.

22 March 2010

Darts during storms

My brother and I tried to play darts outdoors a few days ago during heavy winds. Needles to say, it didn't exactly go as planned. It did however bring forth a crazy idea for a brand new sport.

The ballista darts!

Think about it, each team must construct a ballista from scratch. Then they fire comically sized darts into a 10-meter-radius dartboard. All other rules would be like standard darts, except everything would be HUGE.

It would be awesome. Until somebody loses an eye, that is. Than it would be just hilarious.

09 February 2010

Simple fast edit distance

Edit distance (or Levenshtein distance) is a measure of similarity between two strings. Given two strings A and B, if you are allowed to delete any character in B, insert a new character in B (at any position) or substitute any character in B what is the minimal number of operations required to turn B into A?

The solution is simple dynamic programming. Suppose we want to know how much operations it takes to make the first i characters of A equal the first j characters of B. If A[i] equals B[j] than it's the same number of operations it takes to make the first i - 1 characters of A equal the first j - 1 characters of B. If not, than its one more operation than making:
  1. The first i - 1 characters of A equal the first j characters of B (this correspond to deleting a character from B)
  2. The first i characters of A equal the first j - 1 characters of B (this correspond to inserting a character in B)
  3. The first i - 1 characters of A equal the first j - 1 characters of B (this correspond to substituting a character in B)
Whichever number is the smallest, plus one, gives the answer. Using this we can solve the problem bottom up or top down. The worst case complexity of this algorithm is O(n * m) where n and m are lengths of strings A and B respectively.

Now, in certain areas this problem is expanded to: given a string A and a dictionary D. Find the closest string (with minimal edit distance) to A in D. The simple solution is to run the algorithm described above and just take the minimal distance. This however, can be greatly improved upon.

Our first observation is this, if the length of string A is n and the length of some string B in D is m. The distance between A and B is at least the absolute value of n - m. Now suppose that we have processed some words in D and the current solution is x. We now know there is no point in even considering any word whose length is more than x away from A.

But suppose we have a string B whose length is less than x away from A. Can we use the current minimum solution to our advantage here? It turns out we can. If you use top-down approach for the algorithm described above, you can maintain the current minimum number of operations. If this number increases above x, you can terminate that entire branch without fear. This reduces the complexity from O(n * m) to O(n * x). If we can rapidly decrease x to small numbers (say 1,2,3) this is quite a speedup.

The best way to rapidly decrease x is to apply the first observation again. The only words who can have edit distance 0 in D are those that have exactly the same number of characters as A. So it makes sense to first start with them and try to get x as small as possible. If we do not find a match, it makes sense to try words who have 1 character more/less first. Then 2. Then 3. Then 4. So by preprocessing D, grouping by word length, we can greatly increase the speed.

Another thing you might notice is that it makes sense to prefer the diagonal top-down move (using i - 1, j - 1 to calculate i, j) first since diagonals are more likely to yield smaller solutions.

If you would like to try out some of the things described here you can try this facebook puzzle. I was able to speed up my solution by a factor of 15 using the observations above.

No code this time, solve the puzzle yourself :D

09 January 2010

Puzzled

Tom Bohman, Oleg Pikhurko, Alan Frieze and Danny Sleator from Carnegie Mellon run a cool puzzle site The Puzzle Toad.

They present a series of quite interesting and challenging puzzles. A friend of mine raised some questions about the solution to one of their puzzles called "Seating".

Without further ado here is a monte-carlo simulation confirming their result.

On a side note the current puzzle "Let there be light" is fascinating and difficult and that same friend of mine and I are trying to storm it as you read this.