How It Works

Methodology

A transparent account of how UCInsights collects, processes, and analyzes patient-reported data and scientific literature to generate its insights.

01
Data Collection

Multi-Source Data Aggregation

UCInsights draws from two complementary sources. The community layer comprises 15,600 patient-reported experiences gathered from Reddit r/UlcerativeColitis, dedicated IBD forums, and publicly available patient registries. All identifiable information is stripped during ingestion — only symptom descriptions, food mentions, medication references, and sentiment signals are retained.

The scientific layer comprises 20,000+ peer-reviewed abstracts and full-text papers sourced exclusively from PubMed, covering clinical trials, epidemiological studies, and systematic reviews on Ulcerative Colitis published by research institutions worldwide.

Reddit r/UlcerativeColitis IBD Forums PubMed API Patient Registries
02
Processing & Classification

Anonymization & Categorization

Each patient report passes through a multi-step cleaning pipeline. Named entities (usernames, locations, personal names) are removed. Reports are then tagged by: primary symptom mentioned, food items referenced, medication named, and overall sentiment polarity (positive / negative / neutral).

Scientific abstracts are indexed by MeSH terms, study type, sample size, and outcome measures. Conflicting findings between sources are flagged rather than silently averaged.

NER Anonymization MeSH Indexing Sentiment Classification Conflict Detection
03
AI Analysis

NLP & Pattern Recognition

Claude AI performs sentiment analysis, co-occurrence detection, and statistical correlation across symptoms, foods, medications, and lifestyle factors. For each detected pattern, a confidence interval is calculated based on report count and variance. Patterns with fewer than 50 supporting reports are excluded from public-facing data.

The symptom correlation matrix uses Pearson correlation on binary co-occurrence vectors. Food safety scores are computed as the proportion of positive-sentiment mentions divided by total mentions for that food item, smoothed with a Laplace prior to handle sparse data.

Claude AI (Anthropic) Pearson Correlation Co-occurrence Analysis Confidence Intervals
04
Personalization

Personal Pattern Summary

When a user logs data in the app, their personal entries feed into a multi-factor symptom pattern model. The model incorporates stress level (1–5), bowel frequency, symptom severity, dietary inputs, and sleep duration. A simplified Mayo Score (UCAI) is computed daily to provide a self-monitoring reference — this is not a clinical score and should not be used for medical decision-making.

Multi-factor Model Simplified Mayo Score On-device Processing

Update Frequency

Community dataset figures are recalculated quarterly. PubMed index is refreshed monthly. Personal pattern data updates in real time on-device.

Minimum Thresholds

No statistic is displayed unless backed by at least 50 unique reports. Medication outcome rates require a minimum of 200 mentions across the dataset.

Correlation vs. Causation

All associations reported by UCInsights are correlational. No causal claims are made. Community data cannot establish causality.

Privacy by Design

No personal data leaves your device without explicit consent. Community data is aggregated and anonymized before analysis. No individual can be re-identified.

Known limitations

Ethics note: UCInsights does not scrape or store any personally identifiable information. All community data is sourced from publicly accessible posts. The project complies with the terms of service of all data sources and follows responsible AI usage guidelines.