Table Of Contents
Last updated: May 13, 2026 · By Akhil Lodha
Brokerage statements are the most overlooked compliance data source in a wealth management firm. Every household sends them in. Every advisor reads them once. Then they sit in a folder, structured data trapped inside an unstructured PDF, while the firm runs supervision, cost-basis review, and allocation reconciliation from a different system that does not always agree with what the custodian reported. AI brokerage statement extraction closes that gap by turning the statement back into the structured data it was before it became a printable document — lot-level, audit-citable, and ready for the supervision queue.
This guide is written for the COO or Operations lead at a mid-market RIA or broker-dealer evaluating how AI changes statement processing — from Schwab and Fidelity holdings tables through Pershing multi-page activity blocks to lot-level cost basis, fee reconciliation, and the downstream supervision the firm runs on top of that data. Statement extraction itself sits with operations; the IPS, share-class, and books-and-records workflows it feeds are typically shared with the CCO.
TL;DR A monthly brokerage statement carries dozens of distinct field types per account — holdings, lots, transactions, fees, distributions, beneficiaries, restrictions — multiplied across positions. AI extraction normalizes that data from PDF into structured records the firm can supervise against, reconcile against custody, and defend under Rule 204-2. For mid-market and enterprise firms, the harder problem is not the statement itself — it is the seamless flow from statement to Good Order check to Suitability data in the CRM to portfolio supervision against the IPS, without re-keying between three point tools.
What is actually inside a brokerage statement
Pulled apart, a monthly statement is far more structured than it looks. FINRA's own guide walks through twelve distinct sections — from statement period, account information, contact information, and clearing firm through account summary, income summary, fees, account activity, margin, and portfolio detail to disclosures and definitions. Each section maps to a specific operational or compliance question the firm has to answer at least quarterly.
| Statement section | Data points | Compliance use case |
|---|---|---|
| Holdings / portfolio detail | Symbol, CUSIP, quantity, market value, cost basis, unrealized gain/loss, asset class | Allocation reconciliation, concentration screening, IPS drift |
| Activity | Trades, transfers, journaled shares, dividends, distributions, fees | Trade supervision, fee reasonableness, suitability evidence |
| Income and expense | Interest, dividends, realized gains, advisory fees, 12b-1 fees | Form 1099-B reconciliation, fee-only / wrap account verification |
| Account-level information | Account type, ownership, restrictions, beneficiaries, clearing firm | Suitability mapping, beneficiary-on-file confirmation, custodian attribution |
| Margin | Loan balance, interest paid, maintenance status | Leverage supervision, Reg T verification |
The data is all there. The PDF strips away the structure that made it usable. AI brokerage statement extraction puts it back — lot-by-lot, transaction-by-transaction, with a citation to the page it came from.
AI brokerage statement extraction is not OCR or template parsing
OCR converts pixels to text. Template parsing bolts rules onto OCR output and breaks every quarter. AI brokerage statement extraction uses layout-aware document AI to read the document by structure — absorbing custodian template revisions without per-statement engineering.
Most readers conflate three different technologies under "AI extraction." Treat them as the same and you will buy a tool that solves the wrong problem.
- OCR (optical character recognition) converts pixels to text. Modern OCR is highly accurate at the character level on clean, machine-generated PDFs — but it does not know that the number in column 4 is cost basis rather than market value. OCR alone leaves you with strings that still have to be parsed.
- Template-based parsing bolts hand-built rules onto OCR output: "for a Schwab statement, the holdings table starts at page 3 and cost basis is column 7." It works until the custodian changes its template — which Schwab, Fidelity, and Pershing all do at least once a year. Every change forces a rebuild.
- Document AI extraction uses layout-aware vision models trained on thousands of statements to read the document the way an operations analyst does. It identifies the holdings table by structure, understands that "Long-term covered" is a tax-lot term classifier, and absorbs new layouts without per-custodian engineering.
For a fuller treatment across the platform, see ai document data extraction and the companion ai powered data extraction. For the vendor comparison on extraction across tax, estate, and brokerage segments, see the best financial document extraction software for RIAs. This post focuses on brokerage statements specifically.
Why custodian-specific layout drift breaks rules-based brokerage statement extraction
A Schwab statement is not a Fidelity statement. Template-tuned parsers degrade sharply across heterogeneous custodian formats — and that degradation is the operational risk firms with five or more custodians pay every quarter.
A Schwab statement looks nothing like a Fidelity statement, which looks nothing like a Pershing statement. The differences are not cosmetic — they shape what the firm can actually reconcile.
- Schwab exposes lot-level cost basis with acquisition date and short-term / long-term classification through its tax lot statement and position-detail views, alongside the regular monthly statement. A parser that ingests only the monthly statement loses the term distinction and the harvest signal that sits in the lot data — the tax lot statement is the one to reach for.
- Fidelity itemizes management fees, 12b-1 fees, and program charges inside the activity section, by date and account, alongside the holdings and transaction tables. Rules-based parsers tuned only to the holdings table skip this region entirely — and that is where the share-class violations live.
- Pershing distributes activity tables across multi-page blocks that on a large household statement can span many tens of pages, with account-level subtotals that must reconcile across page breaks. A model that does not understand the document as a whole drops or duplicates rows at the page boundary.
- Raymond James and LPL typically present position-level summaries with the underlying lot detail and cost-basis breakdown in adjacent or supplemental sections. The supplemental data is what supervision needs; the summary is what an unsophisticated parser picks up.
Layout-agnostic document AI delivers materially higher field-level accuracy across heterogeneous custodian formats than template-based parsers, which can match it on the one custodian they were tuned to and degrade sharply elsewhere. For a firm with five or more custodians, document AI is the only architecture that does not require an engineering ticket every quarter.
The four AI brokerage statement extraction problems that actually matter
The hard work in AI brokerage statement extraction is not OCR accuracy — it is layout drift across custodians, multi-page tables, lot-level cost basis with term classification, and the fee and corporate-action edge cases that supervision most needs surfaced.
"AI for documents" is a crowded space. For brokerage statements specifically, the work breaks down into four problems. A platform that solves two of them does not solve all four.
- Layout drift across custodians. Schwab, Fidelity, Pershing, LPL, Raymond James, Goldman Sachs custody, J.P. Morgan, and the long tail of independent custodians each issue a different format and update on their own cadence. AI models trained across thousands of statement layouts absorb drift; rules-based parsers do not.
- Multi-page tables that span sections. The holdings table on a 60-page Pershing household statement does not respect page boundaries, and account totals appear in three places that have to reconcile. Models that do not understand document structure silently drop or duplicate lines at page breaks.
- Lot-level cost basis with term classification. Short-term versus long-term, harvest candidates, wash-sale exposure under IRS Section 1091, accurate cost basis on the next 1099-B — all require lot-level data with acquisition date and term per the IRS Form 1099-B instructions. A platform that returns only position totals is solving the easier version of the problem.
- Fee, distribution, and corporate-action edge cases. Advisor fees, share-class conversions, special dividends, return-of-capital adjustments — the line items most easily lost in extraction are the ones compliance most needs surfaced with a page citation.
Why extraction is only half the job — the data has to flow somewhere
Extraction in isolation is a feature. The reason the firm is doing it is to feed the workflows that depend on the data: Good Order review on the paperwork itself, Suitability data into the CRM, IPS drift in the compliance queue. A point tool that extracts but stops there leaves the operations team manually pushing data into three downstream systems — which is the work the firm hired the tool to remove.
Allocation reconciliation and IPS drift
The policy says 60/40 with a 5% band. The custody feed and the statement do not always agree on the actual allocation, especially across multi-account households with held-away assets and journaled shares. Extracted statement data, normalized across custodians, becomes the source of truth the firm supervises against. Our guide to ai investment policy statement software covers the IPS half of the workflow; portfolio supervision ria ips intelligence covers how the supervision queue runs against that data.
Cost-basis review and tax-aware supervision
Wash-sale detection across substantially identical positions in multiple accounts, harvest-candidate identification, short-term versus long-term sleeve management — all require lot-level data the custody portal does not always expose cleanly, and the 1099-B does not always reconstruct correctly across brokers. Our piece on wash sale rule ria compliance covers why this matters at exam.
Fee verification and share-class supervision
Statements show what the custodian actually charged. That is where compliance verifies advisor fees match the agreement, share classes match the IPS, and no surprise 12b-1 fees have crept in mid-quarter. The share-class case is covered in 12b 1 fees ria compliance problem and mutual fund share class violations how to fix; continuous monitoring is covered in share class monitoring software rias broker dealers.
The compliance backbone: 204-2, 206(4)-7, and the 2026 exam cycle
Brokerage statements are not just a data source — they are the books-and-records evidence of every account the firm supervises. Three regulatory anchors make this concrete.
- Rule 204-2 (Books and Records) requires advisers to keep, at minimum: under (a)(3) a memorandum of each order for the purchase or sale of any security with terms and account identification; under (a)(7) all written communications relating to recommendations, advice, fund or securities movements, and transaction orders; under (a)(1) journals of original entry; and under (a)(10) all written client agreements. Records must be preserved for not less than five years from the end of the fiscal year of the last entry, with the first two years on premises per (e)(1). Brokerage statements are the supporting evidence behind (a)(3) order memoranda — without them the chain of evidence breaks.
- Rule 206(4)-7 requires written policies reasonably designed to prevent violations and an annual review confirming those policies work. Extracted statement data makes that review evidence-based rather than narrative; compliance stack 206 4 7 portfolio supervision gap walks through the gap that opens when the review cannot point to underlying account data.
- SEC and FINRA broker-dealer recordkeeping under SEA Rules 17a-3 and 17a-4 covers the parallel obligations — trade blotters, customer account ledgers, order tickets — with the 2022 amendments to 17a-4 permitting an audit-trail electronic storage alternative to the WORM format. Statement-derived structured data fits naturally into the audit-trail model.
The 2026 SEC exam cycle reinforces all of this. The Division of Examinations 2026 priorities, released in November 2025, identify fiduciary duty, the custody rule, compliance programs, and the 2024 amendments to Regulation S-P as core focus areas, and explicitly flag firms' use of artificial intelligence and other automated technologies as an emerging risk — expecting written policies, designed and implemented, around AI use including in back-office operations. Statement-level supervision sits inside both the examination focus and the books-and-records defense.
How to evaluate AI brokerage statement extraction
Strong AI brokerage statement extraction handles layout drift, lot-level cost basis, custody reconciliation, Good Order checks, audit-trail evidence, and seamless flow into downstream supervision. The six dimensions below score whether a platform can sustain those at quarter-end load.
This is the framework we use to score a platform. Six dimensions, 1–5 each. Anything below 3 on dimensions 1, 2, 4, or 6 is a workflow risk you will pay for at the next exam.
- Custodian coverage and layout resilience. Does it work across the custodians your book actually uses — including the long tail of independent custodians — with no per-statement engineering when the custodian changes its template?
- Lot-level cost basis fidelity. Does it return lot-level cost basis with acquisition date and short-term / long-term classification, not just position totals? Does it preserve wash-sale adjustment markers when the 1099-B reports them?
- Reconciliation against custody feed. When the statement and the custody feed disagree, does the platform surface the discrepancy with the underlying evidence on both sides — or silently pick one and move on?
- Good Order checks on the paperwork. Does the platform run automated Good Order review — signatures, dates, custodial-form-version match, required fields — so the operations team works exceptions instead of every document?
- Audit trail and reviewer evidence. Does each extraction produce a per-field confidence score, source citation back to the PDF page, and reviewer attribution? Under Rule 204-2(a)(3) and (a)(7), you will need to defend the data the firm acted on.
- Seamless flow into downstream supervision. Does the extracted statement data flow automatically into the CRM, the Suitability record, and the IPS supervision queue — or does the operations team re-key it into three downstream systems? The reconciliation work is where most operations teams burn out at scale.
How StratiFi runs brokerage statement extraction as one seamless flow
StratiFi treats statement extraction as one stage of a connected workflow, not a one-shot service. Three modules, one data lineage — the differentiator for mid-market and enterprise firms scaling past 10+ advisors who have felt the cost of running three reconciled point tools.
- OperationsIQ — statement extraction and Good Order. OperationsIQ ingests statements from any custodian an RIA or broker-dealer actually uses (the major US custodians plus the long tail of independents), returns positions with lot detail, acquisition date, and term classification where the source statement carries it, and runs AI-powered Good Order checks on the paperwork itself. It then extracts the Suitability fields the firm uses for Reg BI — Investment Objective, Risk Tolerance, Investment Experience, Asset Allocation, Time Horizon, Liquidity Needs, Tax Considerations, Restrictions, Beneficiary info — from IMAs, IAAs, New Account Applications, custodial paperwork, mutual fund forms, and client update forms. New layouts are absorbed by the model, not built by hand. Parsed data lands in the CRM on the household record where the advisor already opens it.
- ComplianceIQ — supervision on top of the extracted data. ComplianceIQ monitors the portfolio against the IPS every business day using the holdings + Suitability data OperationsIQ produces. Material breaches — bands, concentrations, restricted-security violations, distribution-constraint conflicts — land in the CCO's queue with the underlying evidence attached: the source statement, the page citation, the reviewer attribution. The 206(4)-7 annual review runs against evidence rather than narrative.
- AdvisorIQ — the upstream sales-stage workflow. For the firm's own onboarding, AdvisorIQ scans the prospect's brokerage statements, tax returns, estate documents, and insurance policies at the advisor stage and generates the IPS off real client data. The same data lineage OperationsIQ extends from after the prospect becomes a client.
This seamless flow — advisor sales workflow into firm-level extraction and Good Order into compliance supervision — is what mid-market and enterprise firms find most valuable as they scale. Point tools that handle one slice leave the operations team manually reconciling three data sets quarterly. StratiFi removes that work by design.
Two customer examples make this concrete. Fortis Capital Advisors moved proposal delivery from 24–48 hours to same-day across 10K accounts — 10x faster turnaround on the document-to-supervision flow. Creekmur Wealth Advisors saves 50–200 hours per acquisition transition by running the same flow against an acquired book's statements and Suitability paperwork. Both case studies are published in full with the workflow detail.
The principle is the one we hold across the platform: human judgment amplified by institutional-grade intelligence. The operations team still owns the exceptions — the platform makes sure they are looking at the right ones, in one system instead of three.
See OperationsIQ and ComplianceIQ run on your custodians
A 30-minute working session with anonymized statements from your actual custody mix. We will show you the lot-level reconciliation report, the Good Order check on a sample IMA, and the Suitability fields flowing into the CRM and the supervision queue.
Book a walkthroughWhat the first 90 days look like
Start with the data, not the workflow. Get statements extracted, Good Order checks running, Suitability data flowing into the CRM, and the supervision queue reading from the same lineage — before redesigning any process around the new data.
- Days 1–30: connect custody feeds, ingest the last 12 months of statements for the full book. The output is a normalized holdings, lot-level cost-basis, and activity dataset for every account — the source of truth the supervision layer reads from.
- Days 31–60: turn on reconciliation against the custody feed and run Good Order checks on the firm-level paperwork backlog (IMAs, IAAs, New Account Applications, custodial forms). The operations team works through the discrepancy and exception queues once; from then on they stay at manageable size because new documents are processed as they arrive.
- Days 61–90: connect the extracted data to IPS supervision, share-class monitoring, and cost-basis review. The annual 206(4)-7 review for the next cycle now runs against evidence the firm can defend — account, position, date, page citation — all on one data lineage.
Key takeaways
- A monthly brokerage statement carries dozens of field types per account across holdings, lots, transactions, fees, and account-level data; AI brokerage statement extraction returns those fields as structured data the firm can supervise against.
- OCR, template parsing, and document AI are three different technologies; only document AI absorbs custodian layout drift across Schwab, Fidelity, Pershing, LPL, Raymond James, and the long tail without per-statement engineering.
- Lot-level cost basis with acquisition date and term is the data point that drives wash-sale detection, harvest identification, and accurate 1099-B reconciliation.
- Rule 204-2(a)(3), (a)(7), and (e)(1) make extracted statements the supporting evidence behind every order, communication, and journal entry the firm has to defend at exam.
- The 2026 SEC exam cycle treats AI use — including in back-office operations — as a supervised activity. Firms need designed, written policies, not just deployed tools.
- For mid-market and enterprise firms, the differentiator is the seamless flow from extraction to Good Order to Suitability data in the CRM to portfolio supervision — one data lineage, no re-keying between point tools.
Frequently asked questions
What is AI brokerage statement extraction?
It is software that uses layout-aware document AI — not bare OCR, not template parsing — to read brokerage statement PDFs and return structured data: holdings, lot-level cost basis, transactions, fees, distributions, account-level constraints. The firm can then reconcile that data against custody and supervise it against the investment policy, with a per-field citation back to the source PDF page.
How is document AI different from OCR or template parsing?
OCR converts pixels to text and stops there. Template parsing bolts hand-built rules onto OCR output and breaks every time a custodian changes its template. Document AI uses vision-and-language models trained on thousands of statement layouts to read the document by structure, so it absorbs Schwab, Fidelity, and Pershing template revisions without re-engineering. In practice that translates to materially higher field-level accuracy across heterogeneous custodian formats than template parsers can sustain.
Which custodians does AI extraction need to cover?
Schwab, Fidelity, and Pershing handle most of the volume at mid-market RIAs, but coverage of LPL, Raymond James, Goldman Sachs custody, J.P. Morgan, and the long tail of independent custodians is what separates a usable platform from a partial one. The evaluation question is not "how many custodians" but what happens at the next layout change. Model-based extractors absorb it; rules-based parsers require an engineering ticket.
Does AI extraction return lot-level cost basis?
The strongest implementations do. Lot-level cost basis with acquisition date and short-term / long-term classification is essential for wash-sale screening under IRS Section 1091, tax-loss harvest identification, and accurate 1099-B reconciliation. A platform that returns only position totals does not support tax-aware supervision.
How does StratiFi connect extraction to supervision?
OperationsIQ extracts the brokerage statement, runs Good Order checks on the paperwork, and pulls Suitability data (Investment Objective, Risk Tolerance, Investment Experience, Asset Allocation) from the supporting IMAs, IAAs, and account applications into the CRM. ComplianceIQ then monitors the portfolio against the IPS continuously using that data. AdvisorIQ generates the IPS upstream at the advisor sales stage. Three modules, one data lineage — no re-keying between systems.
How does extracted statement data support Rule 206(4)-7 supervision?
Rule 206(4)-7 requires annual review of the firm's compliance policies based on evidence those policies worked at the account level. Extracted statement data — reconciled against custody and the investment policy — makes the review defensible: it cites specific accounts, positions, and statement dates rather than narrative summaries.
What records does extracted statement data create for SEC examination?
Under Rule 204-2, advisers must keep order memoranda (a)(3), written communications relating to recommendations and transactions (a)(7), and journals of original entry (a)(1) for at least five years, the first two on premises per (e)(1). AI extraction produces a structured copy of the supporting statement data with per-field confidence scores and citations back to the source PDF page — one click from the number to the page, not a manual reconstruction weeks later.
Talk to StratiFi about your statement extraction workflow
If you are evaluating AI brokerage statement extraction, the most useful first step is a working session on your actual custody mix. We will run lot-level reconciliation on a sample of accounts, Good Order checks on a sample IMA, and show how the Suitability data flows into your CRM and the supervision queue — one connected system, not three.
Book a walkthrough