Go back to Richel Bilderbeek's homepage.
Go back to Richel Bilderbeek's projects.
The Rampal Etienne Project is about calculating the probability of a phylogeny. The algorithm to do so is described in [1].
There already was a program that performs the same algorithm, programmed by Mr X. In The Rampal Etienne Project I have improved the program to, relative to Mr X:
The Rampal Etienne Project has been developed by (sorted alphabetically on first name):
Rampal Etienne: researcher
Richel Bilderbeek: programming
The Rampal Etienne Project was developed from May to about October 2009.
The project started with the published C code of Mr X. The results of this program were assumed to be correct and used as a control during the entire project. Rampal Etienne supplied Richel Bilderbeek with expert knowledge.
Richel Bilderbeek started to convert the code to standard-conform C++ code using the STL, Boost and VCL libraries using the C++ Builder 6.0 and Qt Creator IDE's. Each version of the program by Richel Bilderbeek is tested to yield the same results as the program of Mr X. Using the Shiny C++ profiler the code speed-critical sections have been improved. Measurements were performed in a handcrafted analysis tool and Excel and VCL's TChart component were used for plotting.
All measurements have been performed on two different computers. Comparisons in speed, however, were performed on the same computer. By default a theta of 10.0 was used.
For 162 different complexities (from 24 to 866052) the results between four kinds of simulations have been compared. The probabilities found do not differ much. Differences are attributed to differences in precision in the results: Mr X uses single precision floating point values, where Richel Bilderbeek uses double-precision.
The four simulation types are:
The maximum complexity of the program by Mr X was exactly one million, because this is hard-coded.
The maximum complexity of the program by Richel Bilderbeek is theoretically infinite. In practice, the program is limited by the computer's memory. The maximum complexity solved is 97,656,250 (in a memory use measurement).
It takes more time to calculate the probability of more complex phylogenies.
The relationship between complexity and execution time should be contant-time, logarithmic or linear, but not be exponential.
The relationship found (which is dependent on the computer used) between complexity and execution time is:
T = 1.0543*C - 4.5147 |
T = LOG10(time(sec)) |
The memory used of the program by Mr X equals two hundred million byte, independent of the complexity of the phylogeny, because it is hard-coded to be so. The amount of memory used is about 200,000,000 byte.
The memory used of the program by Richel Bilderbeek depends on the complexity of the phylogeny. The program is limited on the amount of memory that can be adressed. The maximum amount of memory used is 792,237,104 byte.
The memory used to store a single phylogeny in memory also differs. Mr X has hard-coded a phylogeny to consist of 200 characters, independent of the actual phylogeny its size (note: the program of Mr X cannot handle complexities larger than this size). In the program by Richel Bilderbeek this memory use depends on the size of the phylogeny. In theory, the version of Richel Bilderbeek can handle infite-size phylogenies.
The relationship found (which is independent on the computer used) between complexity and memory use is:
M = 1.0387*C + 0.6278
|
M = LOG10(memory use(byte)) |
The program of Richel Bilderbeek was tested for large values. It was confirmed that the values in phylogenies do not matter, where the total complexity does.
The source code will not be downloadable yet.
[1] I will not give the reference yet.
Go back to Richel Bilderbeek's projects.
Go back to Richel Bilderbeek's homepage.