Community Data
Patient-Reported Sources
15,600 anonymized reports gathered from publicly accessible patient communities. No personal data is stored or linked to individuals.
Reddit, r/UlcerativeColitis
Online Community
The largest English-language ulcerative colitis patient community on Reddit. Posts and comments filtered for symptom mentions, medication discussions, and food experiences. Usernames stripped on ingestion.
IBD Forums & Communities
Dedicated IBD Platforms
Structured patient forums dedicated to Crohn's disease and ulcerative colitis, including community threads focused on treatment experiences and quality-of-life topics.
Patient Registries
Structured Registries
Publicly available aggregated registry data from IBD patient advocacy organizations. Used for demographic and prevalence cross-validation against community reports.
Scientific Literature
Research Databases
20,000+ peer-reviewed abstracts and full-text papers indexed from the world's leading biomedical literature databases.
PubMed / MEDLINE
Primary Scientific Source
The National Library of Medicine's database of biomedical literature. All scientific content in Ulcerative Colitis Insights is sourced exclusively from PubMed. Abstracts fetched via the Entrez API.
Google News RSS, Ulcerative Colitis Feed
Live Research News
Real-time ulcerative colitis and IBD research news aggregated from Google News via RSS. Powers the live research news feed in the app. Updated continuously; notifications triggered for new clinical findings.
Clinical Trial Data (ClinicalTrials.gov)
Trial Registry
Referenced for ongoing and completed ulcerative colitis clinical trials. Used to contextualize medication data and identify emerging therapies discussed in the community before PubMed publication.
Dataset Summary
Source
Volume
Type
Update cycle
Reddit r/UlcerativeColitis
9,200+
Community reports
Quarterly
IBD Forums
4,100+
Community reports
Quarterly
Patient Registries
2,300+
Structured registry
Annually
PubMed / MEDLINE
20,000+
Scientific abstracts
Monthly
Google News RSS
Live
News articles
Real-time
Personal tracker (on-device)
User data
Health logs
Real-time
AI & Technology
Tools Used for Analysis
The AI and computational tools applied to transform raw data into platform insights.
Custom Trained LLM Language Model
Primary analysis engine
Used for sentiment analysis, natural language understanding, co-occurrence detection, and synthesis of patient reports with scientific literature. Also powers the in-app AI search and community assistant.
Statistical Analysis (Python)
Correlation & prevalence
Pearson correlation matrices, confidence interval calculation, Laplace-smoothed food safety scores, and minimum-threshold filtering implemented in Python (scipy, pandas, numpy).
PubMed Entrez API
Literature retrieval
NCBI's Entrez API used to fetch abstracts, MeSH terms, and metadata for all scientific papers. Queries scoped to Ulcerative Colitis MeSH headings to ensure relevance.
NER Pipeline
Anonymization & entity extraction
Named-entity recognition used to strip identifying information from community posts before ingestion, and to extract symptom names, medication names, and food mentions for structured tagging.
Data ethics: UCInsights never sells or shares your personal health data. Community research data is sourced exclusively from publicly accessible posts in compliance with each platform's terms of service. Your personal health logs are stored in your encrypted account and are never used to train AI models or shared with third parties.