# (Project) The Rampal Etienne Project

## Introduction

The Rampal Etienne Project is about calculating the probability of a phylogeny. The algorithm to do so is described in .

There already was a program that performs the same algorithm, programmed by Mr X. In The Rampal Etienne Project I have improved the program to, relative to Mr X:

• Solve ninety times higher complexities
• Solve a phylogeny faster
• Solve a more complex phylogeny exponentially faster
• Use only the memory needed
• Use memory more efficient

The Rampal Etienne Project has been developed by (sorted alphabetically on first name):

The Rampal Etienne Project was developed from May to about October 2009.

## Table of contents

• Procedure
• Measurements
• Same results
• Solve higher complexities
• Speed comparison
• Memory use
• Large values
• Results
• Downloads
• References

## Procedure

The project started with the published C code of Mr X. The results of this program were assumed to be correct and used as a control during the entire project. Rampal Etienne supplied Richel Bilderbeek with expert knowledge.

Richel Bilderbeek started to convert the code to standard-conform C++ code using the STL, Boost and VCL libraries using the C++ Builder 6.0 and Qt Creator IDE's. Each version of the program by Richel Bilderbeek is tested to yield the same results as the program of Mr X. Using the Shiny C++ profiler the code speed-critical sections have been improved. Measurements were performed in a handcrafted analysis tool and Excel and VCL's TChart component were used for plotting.

## Measurements

All measurements have been performed on two different computers. Comparisons in speed, however, were performed on the same computer. By default a theta of 10.0 was used.

## Same results

For 162 different complexities (from 24 to 866052) the results between four kinds of simulations have been compared. The probabilities found do not differ much. Differences are attributed to differences in precision in the results: Mr X uses single precision floating point values, where Richel Bilderbeek uses double-precision.

The four simulation types are:

• MrXExecutable: MrX's Windows executable
• MrXLinux: MrX's Linux executable
• Storage<SortedNewick>: Bilderbeek's Windows executable
• Storage<SortedNewick> Linux: Bilderbeek's Linux executable

## Solve higher complexities

The maximum complexity of the program by Mr X was exactly one million, because this is hard-coded.

The maximum complexity of the program by Richel Bilderbeek is theoretically infinite. In practice, the program is limited by the computer's memory. The maximum complexity solved is 97,656,250 (in a memory use measurement).

## Speed comparison

It takes more time to calculate the probability of more complex phylogenies.

The relationship between complexity and execution time should be contant-time, logarithmic or linear, but not be exponential.

The relationship found (which is dependent on the computer used) between complexity and execution time is:

 ``` T = 1.0543*C - 4.5147 ```

 ``` T = LOG10(time(sec)) C = LOG10(complexity) ```

## Memory use

The memory used of the program by Mr X equals two hundred million byte, independent of the complexity of the phylogeny, because it is hard-coded to be so. The amount of memory used is about 200,000,000 byte.

The memory used of the program by Richel Bilderbeek depends on the complexity of the phylogeny. The program is limited on the amount of memory that can be adressed. The maximum amount of memory used is 792,237,104 byte.

The memory used to store a single phylogeny in memory also differs. Mr X has hard-coded a phylogeny to consist of 200 characters, independent of the actual phylogeny its size (note: the program of Mr X cannot handle complexities larger than this size). In the program by Richel Bilderbeek this memory use depends on the size of the phylogeny. In theory, the version of Richel Bilderbeek can handle infite-size phylogenies.

The relationship found (which is independent on the computer used) between complexity and memory use is:

 ``` M = 1.0387*C + 0.6278 ```

 ``` M = LOG10(memory use(byte)) C = LOG10(complexity) ```

## Large values

The program of Richel Bilderbeek was tested for large values. It was confirmed that the values in phylogenies do not matter, where the total complexity does.

## Results

• The maximal complexity of phylogenies that can be solved is at least 97 times higher in the version by Richel Bilderbeek
• The speed in the version by Mr X increases exponentially for increasing complexity
• The speed in the version by Richel Bilderbeek increases linearly for increasing complexity
• The memory used in the version of Mr X is always the same and about two-hundred-million bytes
• The memory used by the version of Richel Bilderbeek is dependent on the phylogeny its complexity
• The maximum amount of memory adressed is about four times higher in the version of Richel Bilderbeek

## Downloads

The source code will not be downloadable yet.

## References

 I will not give the reference yet. 