One of the biggest challenges in genetics is to determine th…
One of the biggest challenges in genetics is to determine the relationship between genetic variants and phenotypes. To learn more about these relationships, researchers often sequence the DNA of individuals and then analyze the DNA variant data. For this exam, you will be provided with DNA variant data for four individuals from the same family: a mother, a father, a son, and a daughter. The DNA variant data are provided in Variant Call Format (VCF) files. Like we’ve discussed in the course, VCF files contain multiple “metadata” or header lines that each start with one or more # characters. The remaining lines contain variant data: each variant is listed on a separate line. The CHROM column indicates the chromosome name, the POS column indicates the chromosome position, the REF column indicates the reference allele, and the VAR column indicates the alternative (variant or mutated) allele. So, if I say an “A” is mutated to a “T”, then “A” will appear in the REF column and “T” in the VAR column. The FILTER column indicates whether or not each variant passed the quality-control test. Variants that passed the quality-control test have a value of PASS. Variants that failed the quality-control test have a value of NO PASS. The variant-data columns are tab-delimited. Below are two (small) example VCF files. (Even though the columns may not line up perfectly on the printed page, the variant columns are separated by single tabs.) You do not need to do any error checking on command lines, and all files are tab-delimited. You cannot assume that all characters in all files will be uppercase or lowercase. Example VCF files (note that on the computer, the columns may not line up perfectly visually, even though they are still separated by a single tab): VCF_file1.vcf ##header line 1##other stuff that you don’t have to know##another header line##blah blah blah#CHR POSITION REF VAR FILTER chr1 3675 a g PASS chr1 3789 T G pass chr7 787879 T C NO PASS chr7 787882 C A PASS CHR10 6321 A C PASS chr11 55 T C PASS VCF_file2.vcf ##header garbage##other stuff that you don’t have to know and is really annoying##another header line##blah blah blah##who thought of this file format anyway#Chr POSITION REF VAR FILTER chr1 3675 A G PASS chr1 3789 T G PASS chr7 787879 T C PASS chr7 787883 C A PASS chr11 55 T C PASS chr22 54321 G C NO PASS