The following questions are about database security. Cons…
The following questions are about database security. Consider a health database that stores information about patients who have rare diseases. For each patient, information such as zip code and age are stored to facilitate research about the prevalence of these diseases. (a) Give an example of an inference attack that can be used to find the exact rare disease of an individual when aggregate queries are allowed. Explain what auxiliary public information is used to successfully carry out such an attack. (4+4 pts.)(b) Assume the system returns how many people have a specific rare disease in an area only when this number is greater than k and less than N-k, where N is the total number of patients. Does this defense make the attack in (a) impossible? Explain your answer. (5 pts.) A de-identified and anonymized database D has been made public. Each quasi-identifier (QID) in D appears in at least n rows. Furthermore, values of the sensitive data elements in D are all unique (no two tuples have the same value for the sensitive element). If D satisfies k-anonymity and l-diversity requirements, what are the values of k and l when n=10 and the total number of tuples is 1,000,000? How will the utility of D change when l is increased? (5+5 pts.) MacOS and iOS send information from our devices to Apple that could have an impact on our privacy. According to some reports, for information collected by iOS, Apple claims to provide differential privacy guarantees with epsilon = 14. There are approximately 300 million people in the United States. Assume a certain sensitive property about a person of interest is true only for a single user when no information is shared with Apple. Thus, the random chance that Alice is the person of interest is 1/300000000 in this case. Assume that the information carried by iOS could be used with available auxiliary information to answer the question if Alice is the person of interest. With differential privacy and epsilon = 14, what is the probability that Alice is likely the person of interest when Alice shares her information with Apple. First, write the formula that relates the probabilities of an inference when Alice chooses to share and not share her information. After this, estimate the probability of the inference attack for the given value of epsilon. You can assume that e14 is approximately 1.2 million. Do you consider 14 as a reasonable value for epsilon? (2+3+2 pts.)