A district's data team publishes the annual enrollment report to the board. Names are removed. Everyone agrees it's a "de-identified" summary safe to post on the district website.
Six months later, a local reporter uses the report to identify a single student. The report didn't name the student. It didn't have to.
This is the re-identification problem, and it's why FERPA's definition of personally identifiable information explicitly includes combinations of attributes that could identify a student "with reasonable certainty," even if no single field names anyone.
The Intuition Behind K-Anonymity
K-anonymity is a simple idea with a fancy name. The "k" stands for a number. A dataset is "k-anonymous" if, for every row in the data, at least k rows have the exact same combination of identifying attributes.
If k = 5, then every row describes a group of at least 5 students who look identical in the data. Even a very determined person with access to the dataset can only narrow any individual down to "one of 5." That's considered reasonably safe.
If k = 1, every row is unique. Each row describes exactly one student. The data is, in effect, identified — just with the names stripped off.
A Worked Example
Suppose a district publishes the following enrollment breakdown for one of its elementary schools:
| Grade | Gender | English Learner | Special Education | Free/Reduced Lunch | Students |
|---|---|---|---|---|---|
| K | Female | No | No | No | 22 |
| K | Male | No | No | No | 19 |
| 1 | Female | Yes | No | Yes | 12 |
| 1 | Male | Yes | Yes | Yes | 1 |
| 2 | Female | No | Yes | No | 3 |
| 2 | Male | No | No | Yes | 18 |
The combination in row 4 — first grade, male, English Learner, receiving special education services, free/reduced lunch — describes exactly one student.
Anyone with access to this report can go to the school, look at the first-grade boys, and (knowing which families are new to the country, which student has an IEP, and which families qualify for free lunch) narrow down to that one child with high confidence. The dataset is technically "de-identified" — no names are listed. But for this row, k = 1, and the student is effectively named.
Why Small Cells Are Dangerous
Educational researchers have used 10 as a rough safe threshold for decades. The U.S. Department of Education's own guidance recommends not publishing any cell with fewer than 10 students, and suppressing adjacent cells when a single cell would be suppressed (because otherwise subtraction can reveal the suppressed value).
The problem compounds when tables disaggregate by multiple dimensions:
- District-level totals are almost always safe
- School-level totals are usually safe
- Grade-by-gender breakdowns are sometimes safe
- Grade-by-gender-by-demographic-by-program cross-tabulations are frequently unsafe
The more dimensions you add to a breakdown, the smaller the cells get, and the more likely one of them drops to 1, 2, or 3.
Common Places Small Cells Appear
1. Board Reports With Demographic Breakdowns
Tables like the one above are routine in board packets. They're informative for trustees. They can also be publicly posted as part of the meeting materials, at which point a small cell becomes a permanent public record.
2. State Data Transparency Portals
State-level dashboards often publish school-level breakdowns. Some states apply suppression rules automatically; others leave it to the district. Districts that copy state-portal data onto their own websites without checking suppression rules can end up publishing more granular data than the state itself.
3. Discipline and Attendance Reports
A report titled "Suspensions by Grade and Demographic Group, Third Quarter" can easily contain cells with single-student counts, especially at smaller schools. "One Asian American female in grade 7 was suspended" is not an anonymous statement in a 600-student middle school.
4. Open Data Releases
Open data initiatives are valuable. But published datasets that include student-level records — even with names removed — are essentially always re-identifiable if they contain enough attributes. The classic academic finding is that knowing a person's ZIP code, birth date, and gender uniquely identifies about 87% of Americans. Student datasets often contain much richer information than that.
What Districts Can Do
1. Apply a Minimum Cell Size Rule
Adopt a policy that any publicly reported cross-tabulation of student data must have cells of at least 10 (some districts use 11 or higher). When a cell would be smaller, either combine categories or suppress the cell and its complement.
2. Apply the Rule to Published Files, Not Just Reports
It's one thing to apply suppression rules when the data team prepares a formal board report. It's another thing entirely to apply them to ad hoc spreadsheets that staff members link from staff pages, bury in PDFs, or share with community groups. Those are frequently the leakiest artifacts.
3. Think About Adjacent Rows
Suppressing a single cell is insufficient if the row total and the remaining cells allow the suppressed value to be calculated by subtraction. "Complementary suppression" — suppressing a second cell to prevent back-calculation — is part of doing this right.
4. Use Statistical Testing Tools
Manual k-anonymity checking is tedious. Statistical disclosure control tools (including the analysis built into SchoolScan) can test entire datasets and flag rows where k is below a threshold, including the rows that matter and the ones that matter only when combined with other published data.
5. Think About What's Already Online
Re-identification risk compounds across datasets. A single published file may be fine in isolation, but combined with a second file elsewhere on the district website, it becomes identifiable. Districts that audit their entire public data footprint — not just individual releases — catch risks that are invisible at the single-file level.
The Deeper Point
"De-identification" in K-12 is not as simple as deleting the name column. A dataset can be de-identified in name and still identified in effect. FERPA recognizes this: the regulatory definition of personally identifiable information explicitly includes "other information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community... to identify the student with reasonable certainty."
K-anonymity gives districts a concrete, measurable way to evaluate whether a published dataset actually meets that standard. It's not the only tool — differential privacy and synthetic data are more rigorous alternatives — but it's the most accessible one for a typical K-12 data team.
The goal isn't perfection. The goal is not publishing a table where k = 1.
Want to know where re-identification risks live in your district's data?
SchoolScan automatically runs k-anonymity analysis on published spreadsheets and CSV files across your web presence, flagging rows that describe individual students.
Request a Demo