Designed and built an interactive radial taxonomy browser for the NSF-funded Neotoma Paleoecology Database, using grounded theory interviews, activity theory modeling, and think-aloud usability evaluation with domain experts.
The Neotoma Paleoecology Database is an NSF-funded, open-access platform holding 60,000+ taxonomic records used by Earth scientists studying climate change, biodiversity, and biogeochemical cycles. The data follows FAIR principles (Findable, Accessible, Interoperable, Reusable), but browsing the taxonomy itself was anything but.
Scientists navigated taxonomic hierarchies through raw SQL queries or flat text lists. There was no visual interface to explore how species, genera, families, and phyla related to each other, no way to trace synonym histories, and no tool to understand where a taxon sat within the larger classification structure.
Through Phase I expert interviews (N=7), I identified three systemic contradictions using Activity Theory:
1. The broken toolchain. Rich metadata entered in data entry tools (Tilia) became
invisible in discovery tools. Analysis moved "off-books" to R scripts, preventing data cleaning
feedback loops.
2. The lumping vs. splitting paradox. Data entry rules forced "splitting"
(preserving
original taxonomic names from authors), but analysis required "lumping" (standardized regional
comparisons). Current tools had no dynamic aggregation capability.
3. Taxonomy as "history of science." Taxonomic names aren't stable labels. They
carry histories of reclassification, synonymy, and uncertainty (e.g., "cf.", "aff.", "type sp.").
Existing interfaces treated them as flat strings.
The project followed a user-centered design model: needs assessment from expert interviews, prototype design informed by grounded theory findings, and formative evaluation through think-aloud testing.
I conducted semi-structured interviews with 7 domain experts (data stewards, researchers, database developers) to understand their actual taxonomy workflows. Using inductive grounded theory coding (open, axial, selective), I surfaced patterns that wouldn't have emerged from a feature wishlist.
The key insight: taxonomy in paleoecology isn't a stable reference system. It's a living record of scientific disagreement. A single pollen grain might be classified differently by two researchers based on regional conventions. The visualization needed to surface this uncertainty, not hide it.
I modeled the taxonomy workflow as an Activity Theory system (Subject, Object, Tools, Rules, Community, Division of Labor) and identified systemic contradictions. The broken toolchain between Tilia (data entry) and Explorer (discovery) meant that metadata quality degraded as data moved through the pipeline. This wasn't a UI problem. It was an infrastructure problem that a visualization tool could partially address by making hierarchical relationships and synonym histories visible.
The core visualization is a radial dendrogram (D3.js cluster layout) representing the full taxonomic hierarchy: Phylum, Class, Order, Family, Genus, Species. The default view shows major taxon groups as terminal nodes. Selecting a group drills into its full sub-tree. This two-level approach prevents the "wall of nodes" problem while preserving the ability to explore any branch in full detail.
Users can search any taxon by name or ID. The system highlights the full ancestor path (root to leaf), auto-rotates the tree to center the result, and displays the complete lineage in a side panel. Synonym-aware search surfaces related names and their validity status, directly addressing the "lumping vs. splitting" contradiction from Phase I.
Rendering a 60,000-node tree in the browser required deliberate performance engineering. I used PostgreSQL materialized views to precompute ancestor paths (eliminating expensive recursive queries at render time), angle-based label culling that progressively reveals text on zoom, and a modular architecture separating data processing, layout, highlighting, and interaction into independent modules. The result: smooth rotation, zoom, and pan on a tree that would crash a naive D3 implementation.
I designed three evaluation tasks mapping to real research workflows: (1) exploring the hierarchy to find a taxon's placement, (2) investigating synonym and validity status, and (3) determining higher-level context for a given species. Five participants completed the tasks while thinking aloud, with screen recording and mouse tracking.
Task 3 (interpreting context) had the highest success rate and lowest completion time (~15.5s). The radial layout made hierarchical context immediately visible. Task 2 (synonym/validity) had the lowest success rate (82.9%), revealing that domain-specific terminology created a "wall of jargon" for non-experts.
Beyond task metrics, I analyzed mouse movement heatmaps to understand where users' attention went. The search bar dominated early exploration, confirming a qualitative finding: users defaulted to search as a compensatory strategy when visual navigation became cluttered or overwhelming. This fed directly into design refinements: reducing visual noise in the default view, improving label readability, and making the search-to-tree connection more prominent.
Open, axial, and selective coding of the think-aloud transcripts surfaced three themes: the "wall of jargon" problem (non-experts struggled with domain text), search as compensatory strategy (visual navigation abandonment), and visual ergonomics (physical discomfort with dense label regions leading to avoidance of advanced features). Each theme mapped to specific interface refinements in the next iteration.
Choosing a radial dendrogram over a sunburst or a flat list wasn't an aesthetic decision. It was an IA decision about how taxonomic depth, branching factor, and synonym relationships should be encoded spatially. The layout IS the information architecture. Every encoding choice (angle for rank, color for group, link highlighting for ancestry) determines what questions users can answer without thinking.
In most product work, you can skip formal theory and go straight to user stories. In paleoecology, where workflows span decades of disciplinary convention, Activity Theory gave me a framework to name contradictions that users experienced but couldn't articulate. "The toolchain is broken" is actionable in a way that "I wish it were easier" is not.
Materialized views, label culling, and modular architecture aren't engineering details irrelevant to design. A tree that takes 8 seconds to render is a tree nobody will use. Making 60,000 nodes feel instant required design-level thinking about what to show and when, not just how fast the code runs.