Transparency

Data Sources

A full account of every data source powering UCInsights — patient communities, scientific databases, clinical registries, and the AI tools used to analyze them.

Community Data
Patient-Reported Sources

15,600 anonymized reports gathered from publicly accessible patient communities. No personal data is stored or linked to individuals.

💬
Reddit — r/UlcerativeColitis
Online Community
The largest English-language UC patient community on Reddit. Posts and comments filtered for symptom mentions, medication discussions, and food experiences. Usernames stripped on ingestion.
~9,200 reports Public posts Anonymized
🩺
IBD Forums & Communities
Dedicated IBD Platforms
Structured patient forums dedicated to Crohn's disease and ulcerative colitis, including community threads focused on treatment experiences and quality-of-life topics.
~4,100 reports Public posts Anonymized
📋
Patient Registries
Structured Registries
Publicly available aggregated registry data from IBD patient advocacy organizations. Used for demographic and prevalence cross-validation against community reports.
~2,300 entries Aggregate data No PII
Scientific Literature
Research Databases

20,000+ peer-reviewed abstracts and full-text papers indexed from the world's leading biomedical literature databases.

🔬
PubMed / MEDLINE
Primary Scientific Source
The National Library of Medicine's database of biomedical literature. All scientific content in UCInsights is sourced exclusively from PubMed. Abstracts fetched via the Entrez API.
20,000+ papers Monthly refresh NLM API
📰
Google News RSS — UC Feed
Live Research News
Real-time UC and IBD research news aggregated from Google News via RSS. Powers the live research news feed in the app. Updated continuously; notifications triggered for new clinical findings.
Live feed Auto-refresh RSS
🏥
Clinical Trial Data (ClinicalTrials.gov)
Trial Registry
Referenced for ongoing and completed UC clinical trials. Used to contextualize medication data and identify emerging therapies discussed in the community before PubMed publication.
Reference only Public registry
Dataset Summary
Source
Volume
Type
Update cycle
Reddit r/UlcerativeColitis
9,200+
Community reports
Quarterly
IBD Forums
4,100+
Community reports
Quarterly
Patient Registries
2,300+
Structured registry
Annually
PubMed / MEDLINE
20,000+
Scientific abstracts
Monthly
Google News RSS
Live
News articles
Real-time
Personal tracker (on-device)
User data
Health logs
Real-time
AI & Technology
Tools Used for Analysis

The AI and computational tools applied to transform raw data into platform insights.

🤖
Claude AI (Anthropic)
Primary analysis engine
Used for sentiment analysis, natural language understanding, co-occurrence detection, and synthesis of patient reports with scientific literature. Also powers the in-app AI search and community assistant.
🔢
Statistical Analysis (Python)
Correlation & prevalence
Pearson correlation matrices, confidence interval calculation, Laplace-smoothed food safety scores, and minimum-threshold filtering implemented in Python (scipy, pandas, numpy).
📡
PubMed Entrez API
Literature retrieval
NCBI's Entrez API used to fetch abstracts, MeSH terms, and metadata for all scientific papers. Queries scoped to Ulcerative Colitis MeSH headings to ensure relevance.
🧬
NER Pipeline
Anonymization & entity extraction
Named-entity recognition used to strip identifying information from community posts before ingestion, and to extract symptom names, medication names, and food mentions for structured tagging.
Data ethics: UCInsights does not collect, store, or sell any personally identifiable information. Community data is sourced exclusively from publicly accessible posts in compliance with each platform's terms of service. Personal health data logged in the app remains on your device unless you explicitly choose to share it. We do not use personal tracker data to train models.