Significantly different median gene counts per lineage

PRIVATE DATA: please contact i5K project leaders for usage permissions.
Please contact Evgeny Zdobnov with any questions (Evgeny dot Zdobnov at unige dot ch)

METHODS:
[1] OrthoGroups from the OrthoDB8 clustering and mapping of i5K species to the Metazoa level were assessed for median gene copy numbers per lineage.
[2] InterPro domain counts for all species were assessed for median gene copy numbers per lineage.
Only widespread OrthoGroups / Interpro domains were assessed: present in >2 Diptera, >2 Lepidoptera, >2 Coleoptera, >2 Hymenoptera, >2 Paraneoptera, >2 non-holometabolous Insecta, and >2 Arachnida.
Wilcoxon tests (Mann-Whitney) were performed between all pairs of lineages that had at least 3 member species with orthologues / domains.
The test attempts to identify significant differences in the medians between two sets of species, here including zeros.
OrthoGroups / InterPro domains with significant (<1e-5) test results for at least 3 tested pairs of lineages are reported.

Tested lineages: nematocera brachycera diptera lepidoptera amphiesmenoptera diplep coleoptera cucujiformia
diplepcol formicidae apoidea aculeata hymenoptera holometabola heteroptera hemiptera paraneoptera nonholoinsecta
insecta hexapoda crustacea pancrustacea mandibulata acari araneae arachnida arthropoda outgroups all

RESULTS: download and view in Excel for easy sorting etc.
Tab delimited results with all 863 OrthoGroups with a difference in median counts between lineages of at least 1
Tab delimited results with all 105 OrthoGroups with a difference in median counts between lineages of at least 4

Tab delimited results with all 855 InterPro domains with a difference in median counts between lineages of at least 1
Tab delimited results with all 429 InterPro domains with a difference in median counts between lineages of at least 4
Tab delimited results with all 97 InterPro domains with a difference in median counts between lineages of at least 4, excluding 'outgroups'
Excel spreadsheet results with all 97 InterPro domains with a difference in median counts between lineages of at least 4, excluding 'outgroups'
Boxplots of results with all 97 InterPro domains with a difference in median counts between lineages of at least 4, excluding 'outgroups'
Example R code for boxplots example_R_boxplot_code.txt

Example OrthoGroup: lysozyme genes Diptera >> Hymenoptera
OrthoGroup LargeClade LMed SmallClade SMed Diff Pvalue
EOG8WHB2D diptera 7.5 aculeata 1 6.5 3.735e-06
EOG8WHB2D diptera 7.5 hymenoptera 1 6.5 8.877e-07
EOG8WHB2D diplep 5 aculeata 1 4 2.889e-07
EOG8WHB2D diplep 5 hymenoptera 1 4 3.420e-08
EOG8WHB2D diplepcol 5 aculeata 1 4 6.820e-08
EOG8WHB2D diplepcol 5 hymenoptera 1 4 5.207e-09



Example InterPro domain: IPR006028 Gamma-aminobutyric acid A receptor/Glycine receptor alpha genes Arachnida >> Hexapoda
InterProID LargeClade LMed SmallClade SMed Diff Pvalue
IPR006028 arachnida 28 hexapoda 10 18 5.054e-06
IPR006028 arachnida 28 holometabola 10 18 5.991e-06
IPR006028 arachnida 28 insecta 10 18 5.128e-06