Multi-Source Data Aggregation
UCInsights draws from two complementary sources. The community layer comprises 15,600 patient-reported experiences gathered from Reddit r/UlcerativeColitis, dedicated IBD forums, and publicly available patient registries. All identifiable information is stripped during ingestion — only symptom descriptions, food mentions, medication references, and sentiment signals are retained.
The scientific layer comprises 20,000+ peer-reviewed abstracts and full-text papers sourced exclusively from PubMed, covering clinical trials, epidemiological studies, and systematic reviews on Ulcerative Colitis published by research institutions worldwide.
Anonymization & Categorization
Each patient report passes through a multi-step cleaning pipeline. Named entities (usernames, locations, personal names) are removed. Reports are then tagged by: primary symptom mentioned, food items referenced, medication named, and overall sentiment polarity (positive / negative / neutral).
Scientific abstracts are indexed by MeSH terms, study type, sample size, and outcome measures. Conflicting findings between sources are flagged rather than silently averaged.
NLP & Pattern Recognition
Claude AI performs sentiment analysis, co-occurrence detection, and statistical correlation across symptoms, foods, medications, and lifestyle factors. For each detected pattern, a confidence interval is calculated based on report count and variance. Patterns with fewer than 50 supporting reports are excluded from public-facing data.
The symptom correlation matrix uses Pearson correlation on binary co-occurrence vectors. Food safety scores are computed as the proportion of positive-sentiment mentions divided by total mentions for that food item, smoothed with a Laplace prior to handle sparse data.
Personal Pattern Summary
When a user logs data in the app, their personal entries feed into a multi-factor symptom pattern model. The model incorporates stress level (1–5), bowel frequency, symptom severity, dietary inputs, and sleep duration. A simplified Mayo Score (UCAI) is computed daily to provide a self-monitoring reference — this is not a clinical score and should not be used for medical decision-making.
Update Frequency
Community dataset figures are recalculated quarterly. PubMed index is refreshed monthly. Personal pattern data updates in real time on-device.
Minimum Thresholds
No statistic is displayed unless backed by at least 50 unique reports. Medication outcome rates require a minimum of 200 mentions across the dataset.
Correlation vs. Causation
All associations reported by UCInsights are correlational. No causal claims are made. Community data cannot establish causality.
Privacy by Design
No personal data leaves your device without explicit consent. Community data is aggregated and anonymized before analysis. No individual can be re-identified.
Known limitations
- Self-reported data is subject to recall bias and social desirability effects. Patients experiencing severe flares may be underrepresented, as they are less likely to post online.
- The patient cohort skews toward English-speaking users on Reddit. Generalizability to non-English-speaking populations is limited.
- Medication response rates reflect community sentiment, not clinical remission as defined by endoscopic or histological criteria.
- Food tolerability varies significantly between individuals and disease stages. Aggregate scores should not override personal experience or clinical guidance.
- The simplified Mayo Score implemented in the app is not validated for clinical use and is intended solely for personal self-monitoring awareness.