Firstly, downloading fillgenotype program and sample example data. 1. program downloading Because our program is written using C language, so if the platform compiling this program is different, the executable program(compiled program) will be different. if you use linux operating system with x86_64 CPU, the compiled program's downloading address is as follows: 2. sample data downloading This sample data is from chromosome04 of rice. the number of SNP sites is 55819 and 950 is the number of inbreeding lines(950lines). download address is exam.input.gz After download the two files, please use gunzip command to exact them. Because these files are compressed by gzip program.
Secondly, setting parameters and running program 1. After you have exacting the gz file of program, using chmod command to change the permission. the command is as follows: chmod +x fillGenotype the meaning of this step is that let the program be executable in linux system. then you use the command below to look the parameters of this program. ./fillGenotype --help you will see the helping information below usage: ./fillGenotype -w <window-size> -k <K-value> -p <noequal-punish> -r <percent-ratio> -i <input-fle> -o <output-file> The setting values of -w -k -r -p parameters is for imputing missed genotype more accurate. About imputing our example file, we used the parameter as follows: -w 80 -k 5 -p -7 -r 0.7 The normal running command is: fillGenotype -w 80 -k 5 -p -7 -r 0.7 -i example.input -o example.output The order of the parameters must be changeless, first parameter is -w [PARAMETER_VALUE_SPECIFIED], second parameter is -k [PARAMETER_VALUE_SPECIFIED], third parameter is -p [PARAMETER_VALUE_SPECIFIED], fourth parameter is -r [PARAMETER_VALUE_SPECIFIED], fifth parameter is -i [INPUT_FILE] sixth parameter is -o [OUTPUT_FILE]. After program is run finished, the result file is as follows. exam.output. On our test Linux server, we cost 142 minutes for finishing imputing missed genotype. if you think that the time of imputing is very long, you can use head command to select the first 2000 lines of input file. head -2000 exam.input > input.file Thirdly, in order to watch the output file clearly, we write a perl script for changeing output file into image. download address: showMatrix.pl.gz Whatever your operate system( linux or windows), the command of this script is : perl showMatrix.pl example.input example.output 5 output_image.jpg You can see the sample image following:
The color of base pair 'A' is red, 'T' is green, 'C' is pink, 'G' is blue, this missed base pair(no be sequenced) is white. Of course, you can modify this script to assign your like colors.
|
||||||
|