Talk:Differential privacy
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||
|
Content could be extracted
[edit]Content could be extracted from another source based on the format and also the unique massive initial edit. Utopiah (talk) 19:04, 16 January 2010 (UTC)
Style Guide
[edit]I doubt that the style guide should be part of an encyclopedic article on differential privacy, but it still might be useful to follow the suggested capitalization rules for article writers. If I hear no objections, I'm planning to remove the style guide from the article and add it here. 141.201.13.113 (talk) 12:39, 26 November 2018 (UTC)
"coins of the algorithm"
[edit]What does "coins of the algorithm" mean? Is this correct usage? 129.6.223.113 (talk) 22:36, 27 February 2015 (UTC)
It is correct usage within the cryptography community, but I agree that it is super-confusing. I'll be getting it out over the next few weeks, as I continue in my total rewrite of this section.Simsong (talk) 02:30, 18 September 2018 (UTC)
Please finish example
[edit]Please finish the example and show how to make the diabetes database differentially private. — Preceding unsigned comment added by 129.6.223.113 (talk) 22:40, 27 February 2015 (UTC)
I'm probably going to replace this with a different example, one that is less loaded and easier to understand.Simsong (talk) 02:31, 18 September 2018 (UTC)
lay people cannot read the formular
[edit]the example should be understandable without the mathematical formular. — Preceding unsigned comment added by 87.78.208.167 (talk) 22:02, 13 August 2015 (UTC)
Agreed. After reading the article I still have no idea what differential privacy is or how it is going to counter the given de-anonymization efforts. 84.245.149.53 (talk) 13:39, 31 August 2016 (UTC)
2001:4898:80E8:C:0:0:0:563 (talk) 18:44, 2 May 2017 (UTC) In particular, I found the equation (1/4)(1-p) + (3/4)p = (1/4) + p/2 to be weird. The (1/4) + p/2 is reasonably obvious just from the description of the coin flips, but the longer equation on the left side isn't obvious at all. Worse, the obvious equation people want is not, given a p, what is the result, but rather, given the result, what is the p.
Perhaps this is a better formulation: Thus, if p is the true proportion of people with A, we can expect an actual result of 1/4 just from getting a tails on the first coin flip and then a tails, and p/2 when we get a heads on the first coin flip. Reversing the equation, given an actual result R, the best estimate of p is (R - 1/4) * 2.
2001:4898:80E8:C:0:0:0:563 (talk) 18:44, 2 May 2017 (UTC)
Definition of ε-stability
[edit]The definition of ε-stability assumes two datasets, and , that differ only on a single element. It is not specified what difference this should be, wheter that element is present or absent in either set or whether its attributes are simply different. However, the formula for the exact definition explicitly establishes an ordering in which the probability related to is bounded by that related to . Why this ordering?
Besides, I agree with some above statemente that the term "the coins of the algorithm" is not clear. Maybe a link to the wikipedia page that clarifies this would help. Elferdo (talk) 08:36, 19 August 2015 (UTC)
another famous example?
[edit]AOL search logs - does this count as another example? — Preceding unsigned comment added by Adsah98 (talk • contribs) 17:02, 7 February 2016 (UTC)
No, it is not a good example. It should not be in this article. It should be in an article on de-identification. The original author of this article was confused between the two concepts. Simsong (talk) 02:32, 18 September 2018 (UTC)
Link to reference not working anymore
[edit]The link to reference 21, differential privacy at iOS, does not work anymore. 185.87.72.149 (talk) 14:17, 21 August 2017 (UTC)
PATE algorithm and utility/privacy trade-off
[edit]The authors of the PATE algorithm (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/0e08bda44d22e076d15edc45afcb2e1a7a231a84.pdf) claim that their approach can answer an unlimited number of queries, by training an ancillary model with a finite number of privacy preserving queries to an ensemble of models (the ancillary model gets only an obfuscated consensus among the members of the ensemble). Further queries to the ancillary model, according to them, do no imply additional loss of privacy. This seems to me to increase the applicability of DP.
Do we really want this article to reference every DP algorithm? There are hundreds of them now. Simsong (talk) 02:33, 18 September 2018 (UTC)
Synopsis
[edit]The current version of this differential privacy article begins by attributing differential privacy to a patent application by Dwork and McSherry.
The correct attribution is:
Dwork C., McSherry F., Nissim K., Smith A. (2006) Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi S., Rabin T. (eds) Theory of Cryptography. TCC 2006. Lecture Notes in Computer Science, vol 3876. Springer, Berlin, Heidelberg[1]
In terms of the date, the submission deadline for this publication was September 6, 2005.[2] That is over 3 months before the patent was submitted.
After the differential privacy took off, this original paper was revised and republished in 2017.[3] In the 2017 version, the history of differential privacy is spelled out: "In the initial version of this paper, differential privacy was called indistinguishability. The name “differential privacy” was suggested by Michael Schroeder, and was first used in Dwork (2006)." (page 13, section 2.2)
The odd history with the name probably explains the confusion, but the wikipedia article should be corrected. --73.93.186.114 (talk) 05:56, 29 January 2019 (UTC)
I have deleted the synopsis. After removing the inaccurate info, there was nothing left. 73.93.186.114 (talk) 04:51, 30 January 2019 (UTC)
References
Differential privacy doesn't contradict the "coin toss" example
[edit]While it is true that differential privacy is sometimes considered to be design to protect the identity of the participants of the database, it is more of a design question than a property of differential privacy. The core difference rises from the notion of a neighboring database. One can define the neighboring databases as those where the only change is in the users' private information, and not in their identifying information (a reasonable use case being where the identity of the participants is widely known). Furthermore, a common variation of differential privacy, known as local differential privacy, talks about mechanisms that behave exactly like the "coin flip" mechanism, i.e. where users randomize responses themselves (instead of, for example, a trusted curator). Finally, the concept that differential privacy requires combining the data to a single output is misinterpreted in this paragraph, as this hints that things like synthetic databases do not conform to the differential privacy requirements, which is visibley not true, as the differential privacy techniques for generating a synthetic database is commonly studied and there exist provably working solutions for this problem. Vexlerneil (talk) 16:15, 4 April 2019 (UTC)
What algorithms are we talking about? Do you mean database queries? You mean omission of variables is an algorithm? Are you sure they are "algorithms"?
[edit]Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. 68.134.243.51 (talk) 14:22, 7 August 2022 (UTC)
Hypman
[edit]Hype 102.90.45.231 (talk) 11:18, 21 March 2023 (UTC)
@ 2A01:5EC0:1801:C567:FD00:AA66:596E:C57D (talk) 07:14, 26 March 2023 (UTC)
This article should not contain mathematical proofs; it is too technical
[edit]Wikipedia is not a math textbook. This article contains mathematical proofs of things like the Laplace notation, and it is far too technical. It needs dramatic simplification to be useful to the general wikipedia reader. Simsong (talk) 16:57, 30 September 2023 (UTC)
Randomized response should not be the first example of using DP
[edit]DP does really bad in local modem and randomized response. We should have better examples as the initial examples. Simsong (talk) 16:57, 30 September 2023 (UTC)
Divvi Up
[edit]The Internet Security Research Group has started a project to construct a "privacy-respecting system for aggregate statistics", which they call "Divvi Up". See DivviUp.org for details. Firefox v108, just released, has experimental support for this facility; Mozilla are going to run trials of this "Privacy Preserving Attribution" as "a non-invasive alternative to cross-site tracking" for advertisers. As well as the ISRG and Mozilla, Cloudflare are contributing to this project.
People are actively writing standards and Rust code for a large-scale Differential Privacy facility.
There are two IETF Drafts in progress:
- Verifiable Distributed Aggregation Functions (VDAF) (IETF page; from the Crypto Forum Research Group)
- Distributed Aggregation Protocol (DAP) (IETF page; built on VDAF; proposed standard; first product of the IETF Privacy Preserving Measurement Working Group)
For whatever it's worth, 4 authors of the VDAF spec are from Google and Cisco as well as ISRG and Cloudflare.
ISTM that
- some heavyweight players in Web technologies are investing significant resources in a new approach to web advertising
- this approach is intended to compete with Google's Privacy Sandbox efforts.
If this project succeeds, we will probably want to mention it in this article, and it might even merit a new article of its own. (We can hope that it gets a better name than "Divvi Up".)
Cheers -- CWC 12:17, 13 July 2024 (UTC)
- This depends on how much coverage it gets in the news, etc., but it could probably go in the implementations section. It could definitely fit in List of implementations of differentially private analyses in any case. Mrfoogles (talk) 17:37, 14 July 2024 (UTC)