Abstract
In this paper, we analyze dog genotypes - positions of DNA sequences that often vary between different dogs - in order to predict the corresponding phenotypes - unique characteristics that result from different genetic code. More specifically, given chromosome data from a dog, we aim to predict its breed category, height, and weight. We explore a variety of linear and non-linear classification and regression techniques to accomplish these three tasks. We show that linear methods generally outperform or match non-linear methods for breed classification. However, the reverse case is true for height and weight regression. We also evaluate the performance of all of these methods based on the number of input features used in the analysis. We conduct experiments using different fractions of the full genomic sequences and demonstrate that phenotypes can be predicted with as few as 0.5\% of the input features available for our analysis, and dog breeds can be classified with 50% balanced accuracy with as few as 0.02% of the features.
- Paper page at EMBC 2022 site.
- Paper at biorXiv.