In this project we extracted more than 3000 literature records from both the Uniprot database (www.uniprot.org) and rice database constructed by China National Rice Research Institute(CNRRI) (http://www.ricedata.cn/index.htm) to annotate the rice gene functions. We also extract rice gene expression data and process them to be easily readable for more than 34000 genes for the eight stages during the growth of the rice from the database of the RiceXPro (http://ricexpro.dna.affrc.go.jp/). This will help us quickly locate the expression stage and area of the gene in the rice growing process which will throw insight to the functions of the genes. We also find all the homologue gene for all the rice genes in NCBI (www.ncbi.nlm.nih.gov) and retrieve all the literature reports for the related homologue gene. As we know that proteins that have similar sequence will have similar functions so we can also gain insight into rice gene function by the homologue gene function. A total of 1529 rice accessions sequencing raw data, including 1083 O.sativa and 446 O. rufipogon lines were collected from national center for gene research (NCGR) rice Hapmap3 database (www.ncgr.ac.cn) . We called about 2472943 single nucleotide polymorphism (SNP) for the wild rice accessions and 1345418 SNPs for the cultivate rice accessions. A total of 275698 indel and deletion (INDEL) are also detected for the wild rice accessions. These SNPs or INDELs will not only show the gene sequence changes which may have an impact on the gene function but also can be used as markers in the linkage mapping. The compare of the SNPs between the cultivated rice SNP information and wild rice SNP information will also throw insight into the domestication process of the gene. Correlation among the genes is an evidence for the involvement of the genes in the same process. We download 2030 rice related affymetrix microarrays from the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo/). The expression information is extracted from the raw microarray data and we calculate the correlation of the genes across all the 2030 arrays for about 37532 genes and extract the top 100 for each gene.
To give a better view of the rice gene in the whole genome level. We also add one genome browser which can enable us to check the gene of the rice easily. We also add the blast application in this project. You can blast against the cultivated rice genome (IRGSP version 4.0 http://rgp.dna.affrc.go.jp/IRGSP/ ) in order to check the sequence information. You can also blast against the assembled wild rice accessions W1943 (O.rufipogon). This will enable us to find the sequence differences between the wild rice and the cultivated rice.
In total, this project integrates the genome variant information, gene expression data, literature annotation and homologue sequence information to construct a platform which is easily accessible to all the genetic researchers.
Reference:
[1]Huang x etc al. A map of rice genome variation reveals the origin of cultivated rice. Nature 2012; 490: 497-501
[2]Huang x etc al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm,Nature Genet. 2012; 44: 32-39
[3]Huang x etc al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nature Genet. 2010; 42: 961-967 |