A teаm оf sоciоlogists аt Floridа State University is gathering information on the number of homeless people in the southern region of the United States. These researchers are __________.
When yоu аre finished with this prоgrаm (аnd ready tо move on to the extra credit program, if you want), please copy and paste “Question 2 of the Proctored Final Exam is complete and ready for grading.” in the text box for this question. Program 2: Find Shared Variants If multiple people carry the same DNA variant as well as the same phenotype (for example, a disease), it may be that this variant caused the phenotype. Your task is to search multiple VCF files and identify the variants that are shared across all of those VCF files. For a variant to be considered shared, the exact same variant line must appear in all the files and the value in the FILTER column must be PASS in all the files. As an illustration, suppose you were looking at VCF_file1.vcf and VCF_file2.vcf (shown above). You would want to find the following three shared variants:chr1 3675 A G PASS chr1 3789 T G PASS chr11 55 T C PASS These variants are shared because the exact same line appears in both input files and the value in the FILTER column is PASS in both files. A fourth variant (chr7, 787879) appears in both files. However, in the first file, the FILTER value is NO PASS, so this variant does not count as a shared variant. Write a Python script that uses sys.argv to accept the following five arguments: The name of the mother's VCF file. The name of the father's VCF file. The name of the daughter's VCF file. The name of the son's VCF file. The name of an output file that your code will need to create. Your Python script should search the four VCF files and identify the variants that are shared across all four individuals (mother, father, daughter, son). After identifying the shared variants, write the data to the specified output file. This should be a tab-delimited file with four columns that correspond to the CHR, POS, REF, and VAR columns in the VCF files. The output file should look the same as the modified VCF files used in this final, except there should be no metadata lines or header lines, all output should be uppercase, and it should not include the FILTER column. You should write the variants to output.txt in the same order they appear in the first input file. (In the example below, that is the order the variants appear in VCF_file1.vcf.) All columns in the output file are tab-delimited. For example, if the server were to execute your code (using only the two VCF files, VCF_file1.vcf and VCF_file2.vcf, for brevity), it would look like the following: python studentcode.py VCF_file1.vcf VCF_file2.vcf output.txt Expected output (tab-delimited and all uppercase): CHR1 3675 A G CHR1 3789 T G CHR11 55 T C You may assume we will always give you exactly four VCF files.
When yоu аre finished with this prоgrаm, pleаse cоpy and paste “Extra Credit Question 3 of the Proctored Final Exam is complete and ready for grading.” in the text box for this question. Program 3: Annotate Variants (5 Points Extra Credit Possible) After identifying shared variants, in order to determine if one of them might be causing the phenotype, it’s necessary to figure out which gene harbors each of the shared mutations. Your task is to take a file formatted the same as your output in Question 2 (list of shared variants) and determine which gene each of the mutations is from. You will not be given the exact file you created in Question 2, just a file that’s formatted the same: four columns, where column 1 is the chromosome, column 2 is the chromosome position, column 3 is the reference allele, and column 4 is the variant, or mutated, allele. You will also be provided with a gene annotations file, which has four columns: chromosome, start position (inclusive), stop position (inclusive), and gene name. Your task is to determine which gene each variant is located in and create a new file exactly the same as the shared variants file except it will have another column with gene name, or “no gene” if the mutation isn’t located in a known gene. To be located in a gene, a mutation should be located on the same chromosome and in a position within the range defined by the genes file. Your program should accept three files from the command line (in the following order): shared variants file, output file where you’ll write your new file, and the genes file. It is possible that more than one mutation will be in the same gene, some mutations will not be located in a gene, and not all genes from the gene annotations file will be used. Following is an example, assuming I have the two following files: shared_variants.txt: chr1 3675 A Ghr1 3789 T Gchr11 55 T C gene_annotations.txt: chr1 3700 6000 GeneAchr2 3300 10000 GeneBchr2 11000 12000 GeneCchr11 55 4500 GeneD Example #1 If I execute the following command: python studentcode.py shared_variants.txt gene_annotations.txt annotated.txt. Your program should create the following file, annotated.txt: annotated.txt (all uppercase and tab-delimited): CHR1 3675 A G NO GENE CHR1 3789 T G GENEA CHR11 55 T C GENED