Simon Willison's Weblog: privacy

Quoting Frederik Braun

2024-08-26T20:26:31+00:00

In 2021 we [the Mozilla engineering team] found “samesite=lax by default” isn’t shippable without what you call the “two minute twist” - you risk breaking a lot of websites. If you have that kind of two-minute exception, a lot of exploits that were supposed to be prevented remain possible.

When we tried rolling it out, we had to deal with a lot of broken websites: Debugging cookie behavior in website backends is nontrivial from a browser.

Firefox also had a prototype of what I believe is a better protection (including additional privacy benefits) already underway (called total cookie protection).

Given all of this, we paused samesite lax by default development in favor of this.

— Frederik Braun

Tags: mozilla, browsers, security, cors, cookies, privacy, firefox, samesite

Quoting Anthropic

2024-06-20T19:19:00+00:00

One of the core constitutional principles that guides our AI model development is privacy. We do not train our generative models on user-submitted data unless a user gives us explicit permission to do so. To date we have not used any customer or user-submitted data to train our generative models.

— Anthropic

Tags: anthropic, ethics, privacy, ai, llms

Private Cloud Compute: A new frontier for AI privacy in the cloud

2024-06-11T15:38:15+00:00

Private Cloud Compute: A new frontier for AI privacy in the cloud

Here are the details about Apple's Private Cloud Compute infrastructure, and they are pretty extraordinary.

The goal with PCC is to allow Apple to run larger AI models that won't fit on a device, but in a way that guarantees that private data passed from the device to the cloud cannot leak in any way - not even to Apple engineers with SSH access who are debugging an outage.

This is an extremely challenging problem, and their proposed solution includes a wide range of new innovations in private computing.

The most impressive part is their approach to technically enforceable guarantees and verifiable transparency. How do you ensure that privacy isn't broken by a future code change? And how can you allow external experts to verify that the software running in your data center is the same software that they have independently audited?

When we launch Private Cloud Compute, we’ll take the extraordinary step of making software images of every production build of PCC publicly available for security research. This promise, too, is an enforceable guarantee: user devices will be willing to send data only to PCC nodes that can cryptographically attest to running publicly listed software.

These code releases will be included in an "append-only and cryptographically tamper-proof transparency log" - similar to certificate transparency logs.

Tags: apple, security, ethics, generative-ai, privacy, ai, llms, certificates, apple-intelligence

Thoughts on the WWDC 2024 keynote on Apple Intelligence

2024-06-10T20:19:13+00:00

Today's WWDC keynote finally revealed Apple's new set of AI features. The AI section (Apple are calling it Apple Intelligence) started over an hour into the keynote - this link jumps straight to that point in the archived YouTube livestream, or you can watch it embedded here:

There's also a detailed Apple newsroom post: Introducing Apple Intelligence, the personal intelligence system that puts powerful generative models at the core of iPhone, iPad, and Mac.

There are a lot of interesting things here. Apple have a strong focus on privacy, finally taking advantage of the Neural Engine accelerator chips in the A17 Pro chip on iPhone 15 Pro and higher and the M1/M2/M3 Apple Silicon chips in Macs. They're using these to run on-device models - I've not yet seen any information on which models they are running and how they were trained.

On-device models that can outsource to Apple's servers

Most notable is their approach to features that don't work with an on-device model. At 1h14m43s:

When you make a request, Apple Intelligence analyses whether it can be processed on device. If it needs greater computational capacity, it can draw on Private Cloud Compute, and send only the data that's relevant to your task to be processed on Apple Silicon servers.

Your data is never stored or made accessible to Apple. It's used exclusively to fulfill your request.

And just like your iPhone, independent experts can inspect the code that runs on the servers to verify this privacy promise.

In fact, Private Cloud Compute cryptographically ensures your iPhone, iPad, and Mac will refuse to talk to a server unless its software has been publicly logged for inspection.

There's some fascinating computer science going on here! I'm looking forward to learning more about this - it sounds like the details will be public by design, since that's key to the promise they are making here.

Update: Here are the details, and they are indeed extremely impressive - more of my notes here.

An ethical approach to AI generated images?

Their approach to generative images is notable in that they're shipping an on-device model in a feature called Image Playground, with a very important limitation: it can only output images in one of three styles: sketch, illustration and animation.

This feels like a clever way to address some of the ethical objections people have to this specific category of AI tool:

If you can't create photorealistic images, you can't generate deepfakes or offensive photos of people
By having obvious visual styles you ensure that AI generated images are instantly recognizable as such, without watermarks or similar
Avoiding the ability to clone specific artist's styles further helps sidestep ethical issues about plagiarism and copyright infringement

The social implications of this are interesting too. Will people be more likely to share AI-generated images if there are no awkward questions or doubts about how they were created, and will that help it more become socially acceptable to use them?

I've not seen anything on how these image models were trained. Given their limited styles it seems possible Apple used entirely ethically licensed training data, but I'd like to see more details on this.

App Intents and prompt injection

Siri will be able to both access data on your device and trigger actions based on your instructions.

This is the exact feature combination that's most at risk from prompt injection attacks: what happens if someone sends you a text message that tricks Siri into forwarding a password reset email to them, and you ask for a summary of that message?

Security researchers will no doubt jump straight onto this as soon as the beta becomes available. I'm fascinated to learn what Apple have done to mitigate this risk.

Integration with ChatGPT

Rumors broke last week that Apple had signed a deal with OpenAI to use ChatGPT. That's now been confirmed: here's OpenAI's partnership announcement:

Apple is integrating ChatGPT into experiences within iOS, iPadOS, and macOS, allowing users to access ChatGPT’s capabilities—including image and document understanding—without needing to jump between tools.

Siri can also tap into ChatGPT’s intelligence when helpful. Apple users are asked before any questions are sent to ChatGPT, along with any documents or photos, and Siri then presents the answer directly.

The keynote talks about that at 1h36m21s. Those prompts to confirm that the user wanted to share data with ChatGPT are very prominent in the demo!

This integration (with GPT-4o) will be free - and Apple don't appear to be charging for their other server-side AI features either. I guess they expect the supporting hardware sales to more than cover the costs of running these models.

Tags: apple, ethics, privacy, security, trust, ai, openai, prompt-injection, generative-ai, chatgpt, llms, apple-intelligence

Update on the Recall preview feature for Copilot+ PCs

2024-06-07T17:30:40+00:00

Update on the Recall preview feature for Copilot+ PCs

This feels like a very good call to me: in response to widespread criticism Microsoft are making Recall an opt-in feature (during system onboarding), adding encryption to the database and search index beyond just disk encryption and requiring Windows Hello face scanning to access the search feature.

Via Wired: Microsoft Will Switch Off Recall by Default After Security Backlash

Tags: trust, windows, security, privacy, ai, microsoft, recall

Quoting Zac Bowden

2024-06-07T17:23:54+00:00

In fact, Microsoft goes so far as to promise that it cannot see the data collected by Windows Recall, that it can't train any of its AI models on your data, and that it definitely can't sell that data to advertisers. All of this is true, but that doesn't mean people believe Microsoft when it says these things. In fact, many have jumped to the conclusion that even if it's true today, it won't be true in the future.

— Zac Bowden

Tags: windows, trust, ai, microsoft, recall, privacy

Stealing everything you’ve ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster

2024-06-01T07:48:04+00:00

Stealing everything you’ve ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster

Recall is a new feature in Windows 11 which takes a screenshot every few seconds, runs local device OCR on it and stores the resulting text in a SQLite database. This means you can search back through your previous activity, against local data that has remained on your device.

The security and privacy implications here are still enormous because malware can now target a single file with huge amounts of valuable information:

During testing this with an off the shelf infostealer, I used Microsoft Defender for Endpoint — which detected the off the shelve infostealer — but by the time the automated remediation kicked in (which took over ten minutes) my Recall data was already long gone.

I like Kevin Beaumont's argument here about the subset of users this feature is appropriate for:

At a surface level, it is great if you are a manager at a company with too much to do and too little time as you can instantly search what you were doing about a subject a month ago.

In practice, that audience’s needs are a very small (tiny, in fact) portion of Windows userbase — and frankly talking about screenshotting the things people in the real world, not executive world, is basically like punching customers in the face.

Via @GossiTheDog

Tags: privacy, security, sqlite, microsoft, recall

Quoting Molly White

2024-05-24T01:19:01+00:00

But increasingly, I’m worried that attempts to crack down on the cryptocurrency industry — scummy though it may be — may result in overall weakening of financial privacy, and may hurt vulnerable people the most. As they say, “hard cases make bad law”.

— Molly White

Tags: blockchain, privacy, molly-white

Text Embeddings Reveal (Almost) As Much As Text

2024-01-08T05:22:25+00:00

Text Embeddings Reveal (Almost) As Much As Text

Embeddings of text—where a text string is converted into a fixed-number length array of floating point numbers—are demonstrably reversible: “a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly”.

This means that if you’re using a vector database for embeddings of private data you need to treat those embedding vectors with the same level of protection as the original text.

Tags: ai, privacy, security, embeddings

Google was accidentally leaking its Bard AI chats into public search results

2023-09-27T19:35:23+00:00

Google was accidentally leaking its Bard AI chats into public search results

I’m quoted in this piece about yesterday’s Bard privacy bug: it turned out the share URL and “Let anyone with the link see what you’ve selected” feature wasn’t correctly setting a noindex parameter, and so some shared conversations were being swept up by the Google search crawlers. Thankfully this was a mistake, not a deliberate design decision, and it should be fixed by now.

Tags: bard, privacy, google, llms

Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail

2023-03-22T03:13:02+00:00

Earlier this month I wrote about how ChatGPT can’t access the internet, even though it really looks like it can. Consider this part two in the series. Here's another common and non-intuitive mistake people make when interacting with large language model AI systems: asking them questions about themselves.

(When I say "large language model AI systems" I mean tools like ChatGPT, Bing and the brand new Google Bard.)

Asking questions like this feels like a natural thing to do: these bots use "I" pronouns (I really wish they wouldn't) and will very happily answer questions about themselves - what they can do, how they work, even their own opinions (I really wish they wouldn't do that).

These questions are likely to produce realistic but misleading answers. They come at a surprisingly high risk for hallucination.

What harm could possibly come from asking them how they work?

No, Bard was not trained on text from private Gmail accounts

The following screenshot started circulating on Twitter earlier today. I'm not linking to it directly because I don't want to encourage it to spread further without extra context.

The user asked:

where does Bard's dataset come from?

Bard replied like this - emphasis mine:

Bard's dataset comes from a variety of sources, including:

Publicly available datasets: These include datasets of text and code from the web, such as Wikipedia, GitHub, and Stack Overflow.

Google's internal data: This includes data from Google Search, Gmail, and other products.

Data from third-party companies: This includes data from companies that have partnered with Google to provide data for Bard's training.

The specific sources of Bard's dataset are not publicly available, but Google Al has said that the dataset is "as comprehensive as possible".

Unsurprisingly, the idea that Bard might have been trained on internal data from Gmail sparked a lot of concern!

Here's an official denial:

Bard is an early experiment based on Large Language Models and will make mistakes. It is not trained on Gmail data. -JQ
- Google Workspace (@GoogleWorkspace) March 21, 2023

(I have some supporting arguments below for if the official denial isn't convincing enough for you.)

Bard was not trained on Gmail. So why on earth did Bard say that it was?

Language models have no concept of "self"

As always with language models, the trick to understanding why they sometimes produce wildly inappropriate output like this is to think about how they work.

A large language model is a statistical next-word / next-sentence predictor. Given the previous sequence of words (including the user's prompt), it uses patterns from the vast amount of data it has been trained on to find a statistically satisfying way to continue that text.

As such, there's no mechanism inside a language model to help it identify that questions of the form "how do you work?" should be treated any differently than any other question.

We can give it hints: many chatbot models are pre-seeded with a short prompt that says something along the lines of "You are Assistant, a large language model trained by OpenAI" (seen via a prompt leak).

And given those hints, it can at least start a conversation about itself when encouraged to do so.

But as with everything else language model, it's an illusion. It's not talking about itself, it's completing a sentence that starts with "I am a large language model trained by ...".

So when it outputs "Google's internal data:", the obvious next words might turn out to be "This includes data from Google Search, Gmail, and other products" - they're statistically likely to follow, even though they don't represent the actual truth.

This is one of the most unintuitive things about these models. The obvious question here is why: why would Bard lie and say it had been trained on Gmail when it hadn't?

It has no motivations to lie or tell the truth. It's just trying to complete a sentence in a satisfactory way.

What does "satisfactory" mean? It's likely been guided by RLHF - Reinforcement Learning from Human Feedback - which the ChatGPT development process has excelled at. Human annotators help train the model by labelling responses as satisfactory or not. Google apparently recruited the entire company to help with this back in February.

I'm beginning to suspect that the perceived difference in quality between different language model AIs is influenced much more heavily by this fine-tuning level of training than it is by the underlying model size and quality itself. The enormous improvements the Alpaca fine-tuning brought to the tiny LLaMA 7B model has reinforced my thinking around this.

I think Bard's fine-tuning still has a long way to go.

Current information about itself couldn't have been in the training data

By definition, the model's training data must have existed before the model itself was trained. Most models have a documented cut-off date on their training data - for OpenAI's models that's currently September 2021, I don't believe Google have shared the cut-off date for the LaMDA model used by Bard.

If it was trained on content written prior to its creation, it clearly can't understand details about its own specific "self".

ChatGPT can answer pretty detailed questions about GPT-3, because that model had been iterated on and written about publicly for several years prior to its training cut-off. But questions about its most recent model, by definition, cannot be answered just using data that existed in its training set.

But Bard can consult data beyond its training!

Here's where things get a bit tricky.

ChatGPT is a "pure" interface to a model: when you interact with it, you're interacting with the underlying language model directly.

Google Bard and Microsoft Bing are different: they both include the ability to consult additional sources of information, in the form of the Google and Bing search indexes.

Effectively, they're allowed to augment their training data with additional information fetched from a search.

This sounds more complex than it actually is: effectively they can run an external search, get back some results, paste them invisibly into the ongoing conversation and use that new text to help answer questions.

(I've built a very simple version of this pattern myself a couple of times, described in How to implement Q&A against your documentation with GPT3, embeddings and Datasette and A simple Python implementation of the ReAct pattern for LLMs.)

As such, one would hope that Bard could offer a perfect answer to any question about itself. It should be able to do something this:

User: Where does Bard's dataset come from?

Bard: (invisible): search Google for "Bard dataset"

Bard: (invisible): search results said: ... big chunk of text from the Google indexed documents ...

Bard: My underlying model LaMDA was trained on public dialog data and other public web documents.

Clearly it didn't do that in this case! Or if it did, it summarized the information it got back in a misleading way.

I expect Bard will have a much better answer for this question within a day or two - a great thing about running models with augmented data in this way is that you can improve their answers without having to train the underlying model again from scratch every time.

More reasons that LaMDA wouldn't be trained on Gmail

When I first saw the claim from that original screenshot, I was instantly suspicious.

Taking good care of the training data that goes into a language model is one of the most important and challenging tasks in all of modern AI research.

Using the right mix of content, with the right mix of perspectives, and languages, and exposure to vocabulary, is absolutely key.

If you train a model on bad sources of training data, you'll get a really badly behaved model.

The problem is that these models require far more text than any team of humans could ever manually review.

The LaMDA paper describes the training process like so:

LaMDA was pre-trained to predict the next token in a text corpus. Unlike previous dialog models trained on dialog data alone, we pre-trained LaMDA on a dataset created from public dialog data and other public web documents. Therefore, LaMDA can be used as a general language model prior to fine-tuning.

The pre-training dataset consists of 2.97B documents, 1.12B dialogs, and 13.39B dialog utterances, for a total of 1.56T words

1.56 trillion words!

Appendix E has more details:

The composition of the data is as follows: 50% dialogs data from public forums; 12.5% C4 data t5; 12.5% code documents from sites related to programming like Q&A sites, tutorials, etc; 12.5% Wikipedia (English); 6.25% English web documents; and 6.25% Non-English web documents.

"C4 data t5" I believe relates to Common Crawl.

So why not mix in Gmail too?

First, in order to analyze the training data you need to be able to have your research team view it - they need to run spot checks, and build and test filtering algorithms to keep the really vile stuff to a minimum.

At large tech companies like Google, the ability for members of staff to view private data held in trust for their users is very tightly controlled. It's not the kind of thing you want your machine learning training team to be poking around in... and if you work on those teams, even having the ability to access that kind of private data represents a substantial personal legal and moral risk.

Secondly, think about what could go wrong. What if a language model leaked details of someone's private lives in response to a prompt from some other user?

This would be a PR catastrophe. Would people continue to trust Gmail or other Google products if they thought their personal secrets were being exposed to anyone who asked Bard a question? Would Google ever want to risk finding out the answer to that question?

The temptations of conspiratorial thinking

Are you still not convinced? Are you still suspicious that Google trained Bard on Gmail, despite both their denials and my logic as to why they wouldn't ever want to do this?

Ask yourself how much you want to believe that this story is true.

This modern AI stuff is deeply weird, and more than a little frightening.

The companies involved are huge, secretive and are working on technology which serious people have grave concerns about.

It's so easy to fall into the trap of conspiratorial thinking around this stuff. Especially since some of the conspiracies might turn out to be true!

I don't know how to best counter this most human of reactions. My best recommendation is to keep in mind that humans, like language models, are pattern matching machines: we jump to conclusions, especially if they might reinforce our previous opinions and biases.

If we're going to figure this stuff out together, we have to learn when to trust our initial instincts and when to read deeper and think harder about what's going on.

Tags: bing, ethics, gmail, google, privacy, ai, generative-ai, chatgpt, bard, llms, training-data

Let websites framebust out of native apps

2022-08-10T22:29:42+00:00

Let websites framebust out of native apps

Adrian Holovaty makes a compelling case that it is Not OK that we allow native mobile apps to embed our websites in their own browsers, including the ability for them to modify and intercept those pages (it turned out today that Instagram injects extra JavaScript into pages loaded within the Instagram in-app browser). He compares this to frame-busting on the regular web, and proposes that the X-Frame-Options: DENY header which browsers support to prevent a page from being framed should be upgraded to apply to native embedded browsers as well.

I’m not convinced that reusing X-Frame-Options: DENY would be the best approach—I think it would break too many existing legitimate uses—but a similar option (or a similar header) specifically for native apps which causes pages to load in the native OS browser instead sounds like a fantastic idea to me.

Via @adrianholovaty

Tags: browsers, privacy, security, adrian-holovaty

Quoting Tim Cook

2021-01-31T18:00:49+00:00

Technology does not need vast troves of personal data stitched together across dozens of websites and apps in order to succeed. Advertising existed and thrived for decades without it, and we're here today because the path of least resistance is rarely the path of wisdom.

— Tim Cook

Tags: apple, privacy, advertising

Quoting Nat Friedman

2020-12-17T19:44:20+00:00

At GitHub, we want to protect developer privacy, and we find cookie banners quite irritating, so we decided to look for a solution. After a brief search, we found one: just don’t use any non-essential cookies. Pretty simple, really. 🤔

So, we have removed all non-essential cookies from GitHub, and visiting our website does not send any information to third-party analytics services.

— Nat Friedman

Tags: cookies, privacy, github

Using achievement stats to estimate sales on steam

2018-08-09T09:03:11+00:00

Using achievement stats to estimate sales on steam

Really interesting data leak exploit here: Valve’s Steam API was showing the percentage of users that gained a specific achievement up to 16 decimal places—which inadvertently leaked their exact usage statistics, since if 0.012782207690179348 percent of players get an achievement the only possible input is 8 players out of 62,587.

Via Ars Technica

Tags: privacy, security

Cookies-over-HTTP Bad

2018-04-07T14:39:06+00:00

Cookies-over-HTTP Bad

Mike West from the Chrome security team proposes a way for browsers to start discouraging the use of tracking cookies sent over HTTP—which represent a significant threat to user privacy from network attackers. It’s a clever piece of thinking: browsers would slowly ramp up the forced expiry deadline for non-HTTPS cookies, further encouraging sites to switch to HTTPS cookies while giving them ample time to adapt.

Via @mikewest

Tags: privacy, cookies, https

Protecting Against HSTS Abuse

2018-03-19T22:21:57+00:00

Protecting Against HSTS Abuse

Any web feature that can be used to persist information will eventually be used to build super-cookies. In this case it’s HSTS—a web feature that allows sites to tell browsers “in the future always load this domain over HTTPS even if the request specified HTTP”. The WebKit team caught this being exploited in the wild, by encoding a user identifier in binary across 32 separate sub domains. They have a couple of mitigations in place now—I expect other browser vendors will follow suit.

Via @troyhunt

Tags: ssl, privacy, security, webkit

Quoting Stuart Langridge

2018-02-01T14:03:08+00:00

What we need to do is come up with a way to help people understand that there are ways to never be lost again, and to listen to any music you want, and to video chat with someone on the other side of the world, without them having to feel disquieted about it. That it's not OK that you're made to feel weirded out. That it's possible for there to be alternatives. That having to feel someone rooting around in your life is not a price you should have to pay.

— Stuart Langridge

Tags: stuart-langridge, privacy

Facebook's Instant Personalization: An Analysis of Fundamental Privacy Flaws

2010-10-02T23:53:00+00:00

Facebook's Instant Personalization: An Analysis of Fundamental Privacy Flaws

Oh FFS. “Instant Personalization” means you visit one of Facebook’s “partner websites” and Facebook instantly tells them your full identity and gives them access to full Facebook connect functionality—without you performing any action other than visiting the site. This will not end well.

Via Hacker News

Tags: cookies, facebook, privacy, recovered

Why do some people disable JavaScript in their browser?

2010-08-25T13:37:00+00:00

My answer to Why do some people disable JavaScript in their browser? on Quora

For security reasons.

Many (most?) web applications have security vulnerabilities due to JavaScript. The most common are XSS, where malicious JavaScript can be injected in to a page, stealing cookies and forcing users to perform actions that they did not intend, and CSRF, where unprotected forms can be abused to again perform unintended actions.

It's amazing how many developers are unaware of CSRF - and quite a lot still don't fully understand the consequences of XSS.

Disabling JavaScript in your browser makes the above attacks much harder to pull off. Once you understand how serious and widespread these problems are, it becomes very tempting to disable JavaScript for unknown sites and only enable it for sites you think have extremely skilled development teams.

That said, I don't personally disable JS in my browsers - but that's only because I haven't been bitten badly yet.

Tags: csrf, javascript, privacy, security, xss, quora

The Evolution of Privacy on Facebook

2010-05-09T11:53:00+00:00

The Evolution of Privacy on Facebook

Brilliant infographic showing exactly how the visibility of different aspects of your Facebook profile has changed in increments since 2005. Also a nice example of Processing.js in action.

Tags: facebook, infographics, privacy, processing, processingjs, recovered

The new Facebook API exposes the events you attend to anyone on the Internet

2010-04-26T12:08:27+00:00

The new Facebook API exposes the events you attend to anyone on the Internet

I’m generally impressed by the new set of Facebook APIs—they’re a whole lot easier to work with than the older stuff—but they’re also clearly a bit half-baked and the privacy model needs some urgent work. The Graph API allows to to see all “open” events that any user has attended or is attending, which can exposes things like their friend’s home addresses. Yes, this means you can stalk Mark Zuckerberg.

Tags: ka-ping-yee, graphapi, facebook, privacy

A new Buzz start-up experience based on your feedback

2010-02-14T10:12:28+00:00

A new Buzz start-up experience based on your feedback

Buzz is switching to the more obvious model: use existing Gmail behaviour to suggest a list of people to follow, rather than auto-following them. It feels pretty clear to me that this is how following recommendations should work.

Tags: follow, following, privacy, google-buzz, buzz

WARNING: Google Buzz Has A Huge Privacy Flaw

2010-02-11T11:30:27+00:00

WARNING: Google Buzz Has A Huge Privacy Flaw

Interesting one this: by default, Buzz creates a public profile for you that lists the people you follow—but your default set of followers is derived from the people you contact most frequently using Gmail. This means users of Buzz may inadvertently reveal their most frequent contacts, which is an issue for people like journalists with anonymous sources, unhappy employees seeking new work or even people having e-mail based affairs.

Tags: privacy, buzz, google, followers, gmail

Google Dashboard

2009-11-05T14:03:56+00:00

Google Dashboard

New Google product which shows exactly how much information Google have stored against your account, all on one page. This is a really useful tool, and hopefully will help set a powerful precedent for other sites to follow.

Tags: google, dashboard, privacy

You Deleted Your Cookies? Think Again

2009-08-17T15:23:32+00:00

You Deleted Your Cookies? Think Again

Flash cookies last longer than browser cookies and are harder to delete. Some services are sneakily “respawning” their cookies—if you clear the regular tracking cookie it will be reinstated from the Flash data next time you visit a page.

Via Bruce Schneier

Tags: cookies, privacy, security, flash, respawning

TOSBack | The Terms-Of-Service Tracker

2009-06-07T10:49:33+00:00

TOSBack | The Terms-Of-Service Tracker

Fantastic idea (and implementation) from the EFF—a site that currently tracks 44 website policy documents and highlights changes to them using a diff engine (from Drupal). A global RSS feed is available—it would be useful if individual feeds for different sites and organisations were also provided.

Tags: tosback, eff, privacy, diff, drupal

On the Anonymity of Home/Work Location Pairs

2009-05-24T13:14:04+00:00

On the Anonymity of Home/Work Location Pairs

Most people can be uniquely identified by the rough location of their home combined with the rough location of their work. US Census data shows that 5% of people can be uniquely identified by this combination even at just census tract level (1,500 people).

Tags: bruce-schneier, privacy, location, census

Quoting Marc Hedlund

2009-05-13T08:41:37+00:00

For the record, I'm a noted privacy freak and I don't pretend to speak for anyone else on this topic. I know that resistance is futile. I continue to believe that there is a great divide on sensitivity about privacy - you've either had your identity stolen or been stalked or had some great intrusion you couldn't fend off, or you haven't. I'm in the former camp and it colors the way I view and think about privacy online. It makes me indescribably sad to see how clearly I and others in my camp are losing this battle.

— Marc Hedlund

Tags: privacy, marchedlund, identitytheft

eval() Kerfuffle

2008-07-02T21:24:39+00:00

eval() Kerfuffle

The ability to read supposedly private variables in Firefox using a second argument to eval() will be removed in Firefox 3.1.

Tags: firefox, eval, security, privacy, javascript, john-resig