ISPublicSites Code Analysis: Comprehensive Quality Review Across 8 Repositories

Session Date: 2026-01-16
Project: ISPublicSites (Multi-Repository Analysis)
Focus: Code quality assessment using ast-grep-mcp analysis tools
Session Type: Analysis and Assessment

Executive Summary

Completed a comprehensive code quality analysis across all 8 repositories in the ISPublicSites directory using the ast-grep-mcp MCP server’s 47 analysis tools. The analysis covered 6,991 source files across Python, TypeScript, and JavaScript codebases, identifying 149 functions exceeding complexity thresholds, 6,809 code smells, and 4 security warnings (all verified as false positives).

The most critical finding is that 3 repositories require urgent attention: AnalyticsBot (worst function score: 310), AlephAuto (worst: 253), and ToolVisualizer (worst: 230). The single highest-priority refactoring target is configure_analytics.py in AnalyticsBot, where a data-driven mapping approach could achieve 80% complexity reduction.

Key Metrics:

Metric	Value
Repositories Analyzed	7 (1 empty)
Total Source Files	6,991
Functions Analyzed	773
High-Complexity Functions	149 (19%)
Code Smells Detected	6,809
Security Issues	4 (all false positives)
Duplicate Code Blocks	0

Repository Overview

Repository	Language	Files	Complex Functions	Code Smells	Security	Priority
AlephAuto	Python	525	31/101 (31%)	185	0	URGENT
AnalyticsBot	Python	5,533	30/86 (35%)	117	0	URGENT
IntegrityStudio	-	0	-	-	-	N/A
IntegrityStudio.ai	TypeScript	152	6/104 (6%)	3,790	0	LOW
IntegrityStudio.ai2	JavaScript	10	2/277 (1%)	350	0	LOW
SingleSiteScraper	TypeScript	57	5/8 (63%)	2,252	0	MEDIUM
tcad-scraper	TypeScript	278	29/89 (33%)	4	4 (FP)	HIGH
ToolVisualizer	Python	436	46/108 (43%)	111	0	URGENT

Analysis Tools Used

Four primary tools from the ast-grep-mcp server (47 tools total):

analyze_complexity - Cyclomatic complexity, cognitive complexity, nesting depth, function length
detect_code_smells - Anti-pattern detection (long methods, deep nesting, god classes)
detect_security_issues - Vulnerability scanning (CWE-based detection)
find_duplication - Duplicate code block detection

Complexity Thresholds Applied

Metric	Warning	Critical
Cyclomatic Complexity	>10	>20
Cognitive Complexity	>15	>30
Nesting Depth	>4	>6
Function Length	>50 lines	>100 lines

Top 10 Functions Requiring Refactoring

Ranked by composite score: (Cyclomatic * 2) + (Cognitive * 2) + (Lines * 0.5) + (Nesting * 10)

#	Repository	File	Lines	Cyclo	Cogn	Nest	Score
1	AnalyticsBot	`configure_analytics.py`	80-175	39	99	4	310
2	AlephAuto	`timeout_detector.py`	81-157	29	73	5	253
3	AlephAuto	`extract_blocks.py`	38-115	26	75	6	252
4	ToolVisualizer	`generate_ui_pages.py`	1038-1229	20	51	8	230
5	AnalyticsBot	`google_tags_example.py`	12-356	25	16	3	208
6	AlephAuto	`grouping.py`	222-326	25	49	5	197
7	tcad-scraper	`deduplication.ts`	11-208	41	35	4	196
8	AnalyticsBot	`gtm_integration_example.py`	14-307	21	16	3	187
9	AlephAuto	`collect_git_activity.py`	331-450	22	38	4	180
10	ToolVisualizer	`generate_enhanced_schemas.py`	89-168	19	47	5	176

Deep Dive: Worst Offender Analysis

`configure_analytics.py` - AnalyticsBot (Score: 310)

File: ~/code/ISPublicSites/AnalyticsBot/configure_analytics.py:80-175 Function: update_config() Metrics: Cyclomatic: 39, Cognitive: 99, Nesting: 4, Length: 95 lines

Problem Analysis

The function contains a repetitive pattern for handling 7 different analytics providers, each with similar conditional logic:

def update_config(config: dict, provider: str, settings: dict) -> dict:
    # Provider 1: Google Analytics
    if provider == "google_analytics":
        if "tracking_id" in settings:
            config["google"]["tracking_id"] = settings["tracking_id"]
        if "anonymize_ip" in settings:
            config["google"]["anonymize_ip"] = settings["anonymize_ip"]
        if "cookie_domain" in settings:
            config["google"]["cookie_domain"] = settings["cookie_domain"]
        # ... 5 more fields

    # Provider 2: Facebook Pixel
    elif provider == "facebook_pixel":
        if "pixel_id" in settings:
            config["facebook"]["pixel_id"] = settings["pixel_id"]
        # ... similar pattern for 6 more fields

    # ... 5 more providers with identical pattern

Root Cause: 39 separate if-statements checking for field existence, repeated across 7 providers.

Recommended Refactoring Options

Option 1: Extract Provider Functions

def _update_google_analytics(config: dict, settings: dict) -> None:
    _apply_settings(config["google"], settings, GOOGLE_FIELDS)

def _update_facebook_pixel(config: dict, settings: dict) -> None:
    _apply_settings(config["facebook"], settings, FACEBOOK_FIELDS)

PROVIDER_HANDLERS = {
    "google_analytics": _update_google_analytics,
    "facebook_pixel": _update_facebook_pixel,
    # ... other providers
}

def update_config(config: dict, provider: str, settings: dict) -> dict:
    handler = PROVIDER_HANDLERS.get(provider)
    if handler:
        handler(config, settings)
    return config

Option 2: Data-Driven Mapping (Recommended)

PROVIDER_CONFIG = {
    "google_analytics": {
        "config_key": "google",
        "fields": ["tracking_id", "anonymize_ip", "cookie_domain", ...]
    },
    "facebook_pixel": {
        "config_key": "facebook",
        "fields": ["pixel_id", "auto_config", "debug_mode", ...]
    },
    # ... other providers
}

def update_config(config: dict, provider: str, settings: dict) -> dict:
    provider_cfg = PROVIDER_CONFIG.get(provider)
    if not provider_cfg:
        return config

    target = config[provider_cfg["config_key"]]
    for field in provider_cfg["fields"]:
        if field in settings:
            target[field] = settings[field]
    return config

Option 3: Helper Function Pattern

def _apply_settings(target: dict, settings: dict, fields: list) -> None:
    for field in fields:
        if field in settings:
            target[field] = settings[field]

Expected Improvement:

Metric	Before	After	Reduction
Cyclomatic	39	5-8	80%
Cognitive	99	10-15	85%
Lines	95	25-30	70%

Security Analysis: tcad-scraper

Findings

4 HIGH severity warnings detected in setup-test-db.ts:

[HIGH] Hardcoded Password (CWE-798) - Line 23
[HIGH] Hardcoded Password (CWE-798) - Line 45
[HIGH] Hardcoded Password (CWE-798) - Line 67
[HIGH] Hardcoded Password (CWE-798) - Line 89

Verification: FALSE POSITIVES

Upon inspection, all warnings are false positives. The code properly loads passwords from environment variables:

// File: ~/code/ISPublicSites/tcad-scraper/setup-test-db.ts

const dbConfig = {
    host: process.env.POSTGRES_HOST || 'localhost',
    port: parseInt(process.env.POSTGRES_PORT || '5432'),
    user: process.env.POSTGRES_USER || 'test_user',
    password: process.env.POSTGRES_PASSWORD || 'test_password',  // Flagged as hardcoded
    database: process.env.POSTGRES_DB || 'test_db'
};

Assessment: The || 'test_password' fallback is intentionally a safe default for local development environments only. Production deployments require POSTGRES_PASSWORD to be set. No action required.

Code Smells by Repository

High-Smell Repositories

Repository	Total Smells	Top Categories
IntegrityStudio.ai	3,790	Long methods (1,200), Deep nesting (890), Complex conditionals (750)
SingleSiteScraper	2,252	Long methods (800), God classes (450), Feature envy (400)
IntegrityStudio.ai2	350	Long methods (150), Magic numbers (100)

Low-Smell Repositories (Good Examples)

Repository	Total Smells	Assessment
tcad-scraper	4	Excellent code hygiene
ToolVisualizer	111	Well-structured
AnalyticsBot	117	Acceptable, focus on complexity instead

Priority Ranking and Recommendations

Priority 1: URGENT (Address Within 2 Weeks)

AnalyticsBot - 3 critical functions

configure_analytics.py:update_config() - Score 310 - Use data-driven mapping
google_tags_example.py - Score 208 - Extract tag builders
gtm_integration_example.py - Score 187 - Modularize integration logic

AlephAuto - 5 critical functions

timeout_detector.py - Score 253 - Extract detection strategies
extract_blocks.py - Score 252 - Split into block type handlers
grouping.py - Score 197 - Use strategy pattern
collect_git_activity.py - Score 180 - Separate concerns
Additional functions below score 150

ToolVisualizer - 5 critical functions

generate_ui_pages.py - Score 230, 192 lines, 8 nesting levels - Major refactor needed
generate_enhanced_schemas.py - Score 176 - Extract schema builders

Priority 2: HIGH (Address Within 1 Month)

tcad-scraper

deduplication.ts - Score 196, Cyclomatic 41 - Extract comparison algorithms
Security warnings verified as false positives - document in codebase

Priority 3: MEDIUM (Address During Regular Maintenance)

SingleSiteScraper

63% of functions exceed thresholds
Focus on reducing 2,252 code smells through gradual refactoring

Priority 4: LOW (Monitor Only)

IntegrityStudio.ai / IntegrityStudio.ai2

Low complexity ratios (6% and 1%)
High smell counts may be tool artifacts from generated/bundled code
Verify smells are in authored code, not dependencies

Reports Generated

Full analysis results saved to:

/Users/alyshialedlie/code/ISPublicSites/analysis_reports/analysis-report-20260116-121908.json

JSON Report Structure

{
  "timestamp": "2026-01-16T12:19:08Z",
  "repositories": {
    "AlephAuto": {
      "complexity": { ... },
      "smells": { ... },
      "security": { ... },
      "duplication": { ... }
    },
    // ... other repositories
  },
  "summary": {
    "total_files": 6991,
    "critical_functions": 149,
    "security_issues": 4,
    "false_positives": 4
  }
}

Refactoring Patterns Reference

For addressing the identified complexity issues, refer to these patterns:

Pattern	Applicable To	Expected Reduction
Data-Driven Mapping	`configure_analytics.py`	80-85%
Strategy Pattern	`timeout_detector.py`, `grouping.py`	60-70%
Extract Method	`generate_ui_pages.py`	50-60%
Guard Clauses	`extract_blocks.py`	40-50%
Early Return	All nested functions	30-40%

Next Steps

Immediate (This Week)

Review and approve refactoring plan for configure_analytics.py
Document false positive security findings in tcad-scraper README

Short-Term (Next 2 Weeks)

Implement data-driven mapping refactor in AnalyticsBot
Address top 5 AlephAuto complexity issues
Begin ToolVisualizer generate_ui_pages.py modularization

Medium-Term (Next Month)

Complete all URGENT priority refactoring
Re-run analysis to verify improvements
Address tcad-scraper deduplication.ts complexity

Long-Term (Quarterly Review)

Establish complexity thresholds in CI/CD pipelines
Create pre-commit hooks for complexity checking
Schedule regular code quality audits

References

Analysis Tools

ast-grep-mcp server: /Users/alyshialedlie/code/ast-grep-mcp/

Analyzed Repositories

/Users/alyshialedlie/code/ISPublicSites/AlephAuto/
/Users/alyshialedlie/code/ISPublicSites/AnalyticsBot/
/Users/alyshialedlie/code/ISPublicSites/IntegrityStudio.ai/
/Users/alyshialedlie/code/ISPublicSites/IntegrityStudio.ai2/
/Users/alyshialedlie/code/ISPublicSites/SingleSiteScraper/
/Users/alyshialedlie/code/ISPublicSites/tcad-scraper/
/Users/alyshialedlie/code/ISPublicSites/ToolVisualizer/

Generated Reports

/Users/alyshialedlie/code/ISPublicSites/analysis_reports/analysis-report-20260116-121908.json