May 22, 2024
A fast-growing lending platform integrated with 12 third-party financial connectors credit bureaus, bank account verifiers, KYC providers, and disbursement rails. As the business scaled 100× in 18 months, so did the volume and variety of integration failures. The existing L2 support team of 10 was overwhelmed, SLA breaches were rising, and the on-call engineers were fielding the same classes of failure repeatedly with no systemic fix.
The team now handles over 100,000 support issues per month with zero SLA breaches. Automation absorbs the long tail of repetitive failures, freeing engineers to focus on new connector classes and edge cases. As the business continues its 100× growth trajectory, the support infrastructure scales with it rather than against it.
May 22, 2024
A large enterprise had customer data siloed across 8 separate systems CRM, ERP, billing, identity, support, product usage, contract management, and communications. Every internal team queried each system independently. The result was inconsistent customer records, duplicated API calls hammering upstream systems, and customer-facing reps unable to see a complete picture of any account in under 10 minutes.
Customer-facing teams now have a complete account view in under 3 seconds. Upstream API call volume dropped by 70% due to caching. Data inconsistencies between systems surfaced and were remediated as a side-effect of building the normalisation layer. The gateway became the foundation for every new internal tool the organisation built thereafter.
Aug 16, 2023
A fintech aggregator needed to access customer banking data from dozens of banks that did not offer open APIs. The only path was to act on behalf of the customer through the bank's own web and mobile interfaces reliably, at scale, across banks with wildly different session management, CAPTCHA implementations, and 2FA flows.
The client successfully accessed account data across all target banks without any open API dependency. Thousands of concurrent sessions run reliably with automatic failover. The system handles the full spectrum of authentication complexity from simple password login to OTP + CAPTCHA + security questions without manual intervention.
Aug 16, 2023
A payments company needed a dedicated, auditable store for customer card data one that could be integrated by any internal service without those services ever touching raw card numbers, while meeting PCI-DSS requirements without requiring every consuming service to enter scope.
The vault passed PCI-DSS audit on first submission. Consuming services integrated in days using the GRPC client libraries. Card data exposure risk was eliminated across the platform no service outside the vault ever processes a raw card number. The system runs cost-efficiently in production with headroom for significant traffic growth.
Aug 16, 2023
A credit analytics company received bank statements as PDFs from customers each bank with its own layout, font, column arrangement, and formatting quirks. They needed to extract structured transaction data from millions of these PDFs reliably, fast enough to support real-time credit decisions, and accurately enough that errors couldn't slip through to underwriting.
100% parse accuracy across all supported bank templates. A typical 4–5 page statement is parsed in under 200ms. The system processes millions of statements per hour in production, enabling the client to deliver real-time credit decisions based on fully structured, verified transaction history.
Aug 16, 2023
A consumer fintech wanted to aggregate credit card and banking offers from across 50+ banks into a single discovery platform. Each bank published offers differently some through static web pages, some through JavaScript-rendered portals, some behind login walls. Manual curation was too slow and error-prone for daily freshness.
Offers from 50+ banks are crawled daily and made available to customers on a single platform. Adding a new bank requires writing a configuration file, not writing new code. The platform supports both static and dynamic sites without architectural changes. Customers now discover offers they would have missed by checking bank sites individually.
Aug 16, 2023
Log management at scale breaks most commercial and open-source tools. Lucene-based systems (Elasticsearch, OpenSearch) build indexes that consume 10–50% of the original log volume in storage at terabyte scale, that becomes hundreds of gigabytes of index overhead. Query latency degrades as index size grows. The cost of running these systems at billion-event-per-day volumes is prohibitive for most organisations.
Episilia delivers full observability at a fraction of the infrastructure cost of Elasticsearch or Splunk. A terabyte of logs becomes 100GB stored, with a 10GB index. Query performance remains consistent as data volume grows because the index never balloons. Organisations running billions of log events per day can do so on commodity hardware rather than enterprise storage clusters.
Aug 16, 2023
A data-heavy SaaS company had built its analytics stack on Snowflake but was hitting query timeouts on their largest tables and paying $4,000/month in compute credits. The ingestion pipeline was batching data inefficiently, table clustering was absent, and analytical queries were doing full micro-partition scans on every run. Costs and latency were both trending upward.
Monthly Snowflake spend dropped from $4,000 to $1,000 a 4× cost reduction while query throughput improved 8–10× and timeouts were eliminated entirely. The client's analysts moved from scheduling long-running queries overnight to running them interactively during the day. The same architecture has since scaled to 3× the original data volume without regression.
Aug 16, 2023
A large financial institution ran its subledger PnL calculations as nightly batch jobs. With growing transaction volumes, the batch window was expanding to the point where results were not available until mid-morning too late for risk desks that needed end-of-day positions at market open. The existing system was single-tenant, stateful, and could not be horizontally scaled without a fundamental redesign.
The institution moved from nightly batch PnL results delivered mid-morning to real-time PnL available within minutes of transaction confirmation. 100 million records are processed and persisted in under 10 minutes. The stateless cluster scales horizontally to meet peak demand. Risk desks now have accurate end-of-day positions available at market open rather than hours later.
Deep-dives into client engagements, engineering challenges solved, and lessons from building production software at scale.