Microsoft Copilot: Where It Wins vs. Where It's a Constraint
Where Copilot genuinely wins
- Meeting summaries in Teams — this alone saves hours weekly. Embedded, no setup, people already use it.
- Email triage and drafting in Outlook — good at "draft a reply to this thread" type work.
- Excel/Power BI data questions — "what drove the variance in DACH margin this quarter?" works surprisingly well when data is already in SharePoint.
- Document summarization — drop a 40-page supplier contract into Word, ask for key terms. Solid.
Where it's a genuine constraint — and what I do instead
Complex multi-step reasoning. Copilot is a single-turn assistant — you ask, it answers, done. It can't plan a 10-step analysis, execute it, review its own work, and iterate. Real examples from the last month alone:
AI researched the entire CLO market (Cardlytics, Dosh, Figg, Kard), analyzed competitive positioning, modeled revenue scenarios, stress-tested assumptions, and produced a board-ready recommendation. It took 3 hours of autonomous work — deep web research, financial modeling, competitive analysis, second opinions from 3 different AI models. A consulting firm would charge $50-100K and take 4 weeks. Copilot can't even start this.
Legal question about consent checkbox requirements across EU markets. AI read our current implementation, researched case law, analyzed 6 competitor approaches, produced a legal risk assessment with specific recommendations per market. Our legal team validated it and said "this is better than what we'd get from external counsel for a first pass."
Needed to understand how a specific checkout service works across 84 systems, 1,425 containers, 386 services. AI read the codebases, mapped dependencies, identified bottlenecks, and produced a technical architecture document. An engineering team would need 2 weeks to do this manually.
For a senior hire, AI analyzed the candidate's output quality against our operating principles, compared with internal benchmarks, and produced a structured assessment. Not replacing human judgment — augmenting it with systematic analysis that no interviewer has time to do.
Ran 4 rounds of iterative drafting with second opinions from 3 AI models at each round. Each model caught different blind spots. The final output was stress-tested against 12 counter-arguments before showing it to the team.
Cross-system orchestration. Copilot lives inside each M365 app. It can't read your email, check your calendar, pull Jira tickets, search Confluence, and synthesize all of that into a weekly briefing in one flow. My CoS setup does exactly this — 36 active projects, each with its own context, knowledge base, and workflow. AI moves between them seamlessly.
Deep research and second opinions. I run the same question through 3 different AI models (Claude, GPT, Gemini) and synthesize where they agree and disagree. Copilot gives you one answer from one model. That's not enough for decisions that matter. When all 3 models independently flag the same risk — that's high confidence. When they disagree — that's where the interesting strategic questions are.
Custom workflows. I have "skills" — reusable workflow templates. "Run the weekly CEO review" triggers a 15-step process: pull Asana tasks, read 5/15 reports, assess against operating principles, draft newsletter. Copilot can't do this.
The exception process I'd design
Don't call it "exceptions." Call it "approved tools for specific use cases." Frame it as risk management, not rebellion.
- Create a simple registry: Tool → Use Case → Data Classification → Owner → Approved By
- Rule: anything touching customer PII or financials stays in Microsoft stack. Everything else is open for evaluation.
- Require a 2-week proof of value: show me the output, show me the time saved, show me the risk assessment. If it works, it's approved.
- The political move: position it as "extending Copilot's capabilities" not "replacing Microsoft." HQ cares about control and audit trail, not which LLM processed the text.
Minimum Connectivity for Chief of Staff Behavior
In order of impact:
- Email (read-only is enough to start) — 60% of the value. AI reads your inbox, categorizes by urgency, surfaces what needs your attention. Without this, you're still manually triaging.
- Calendar — AI needs to know your schedule to be useful. "Prep me for my 2pm with the retail partner" requires knowing what's at 2pm.
- Task/project management (Planner, Jira, Asana — whatever you use) — This is where AI goes from reactive to proactive. "What's overdue? Who hasn't reported? What's at risk?" requires access to the task system.
- Documents/SharePoint — AI needs to read your strategy docs, pricing frameworks, previous analyses. Without this, it can't build on institutional knowledge.
- Data/BI (Power BI, Excel) — For "what happened and why" questions. Hardest to connect but most transformative.
Three Questions That Corner HQ Into Action
These work because they force specificity and timelines, not strategy theater:
"What is the measurable productivity improvement we expect from our current AI investment, and by when?"
This forces them to either admit they don't have targets (which means they're spending without accountability) or state targets (which you can then say "great, let me run an experiment to hit them faster").
"If a competitor in our category achieves 30% faster speed-to-market through AI tooling we've restricted, what is our response plan?"
This reframes the risk. Right now "risk" means "what if AI does something bad." This puts the risk on the other side: "what if NOT using AI does something bad." In FMCG, speed-to-shelf is everything. Make them feel that risk.
"Can I run a 30-day controlled experiment with [specific tool] on [specific use case] with [specific success metric], reporting results to [specific person]?"
This is impossible to say no to without looking like you're blocking innovation. It's time-bound, measurable, governed, and transparent. If they say no, ask them to write down why — in writing. That usually changes the answer.
Three Things to STOP This Quarter
1. Stop weekly status reporting as a document exercise
If people are spending 2-3 hours writing status reports that someone else spends 30 minutes reading — that's a 10:1 waste ratio. Replace with: structured 5-minute input (5 bullet points: what happened, what didn't, what's next, what's blocked, one achievement). AI synthesizes the rest.
The result is a complete leadership package ready for managers: per-person trend analysis, department-level health, cross-departmental patterns, escalation flags, and AI adoption tracking — all generated from 15 minutes of each person's input.
2. Stop alignment meetings that exist because information doesn't flow
Most "sync" meetings exist because systems don't talk to each other. If AI can read email, tasks, and documents — it can generate the alignment brief. The meeting becomes 15 minutes of decisions, not 45 minutes of updates. Kill the update meetings. Keep the decision meetings.
3. Stop manual post-mortem analysis on pricing/promo
If you're doing promo post-mortems manually — pulling data, comparing to benchmarks, writing slides — that's exactly the kind of structured analytical work AI does better and faster. Build it as a repeatable skill/template: input = promo parameters + results data, output = structured analysis with recommendations. Run it every time, automatically.
Ship in 14 Days — Undeniable Value
Every Monday morning, your 6-8 direct reports get a personalized briefing: their team's key metrics from last week, open action items, upcoming deadlines, flagged risks, and 3 questions you want answered by Friday. Generated automatically from email, tasks, and BI data.
Why this works
- It's visible to everyone immediately (not a backend tool)
- It saves each person 30-60 min of self-briefing on Monday
- It demonstrates AI reading across systems (the "wow" moment)
- It creates accountability (the questions force action)
- It's repeatable — runs every week, gets better over time
Definition of done (smallest version)
- Briefing generated for 3 direct reports (not all, just 3 willing pilots)
- Contains: last week's top 3 events from their area + this week's 3 priorities + 1 open question from you
- Delivered by email before 8am Monday
- Took less than 5 minutes of your input on Sunday evening
- At least 2 of 3 pilots say "this is useful, keep it running"
Operating Rulebook for "One Owner per Outcome"
Not a team. Not a function. A person. "Who owns the DACH pricing review?" has one answer. If the answer is "the pricing team" — you don't have an owner.
The owner can consult anyone, but the decision is theirs. If they want consensus, they can seek it. But they don't need it. "I consulted finance and supply chain, and I decided X" is a complete sentence.
Committees are for governance (audit, risk, compliance). Everything else gets an owner. If you need a "steering committee" for a project, the project doesn't have clear enough ownership.
The owner escalates when they're stuck, not when it's a big decision. Big decisions made by the right owner are fine. Small decisions escalated to the wrong level are not.
The danger is real — AI can generate more slides, more analysis, more options, more documents. That accelerates bureaucracy. The rule: AI works FOR the owner to make their decision faster, not for the committee to have more material to discuss.
Your CLAUDE.md Equivalent — The BU/Cluster Constitution
Here's what mine contains, adapted for a BU:
- Who we are, what we manage, key metrics
- Current strategic priorities (max 5)
- What "good" looks like this quarter (specific numbers)
## Operating Principles
- Your non-negotiable rules
# e.g., "simplicity first", "no laziness", "goal-driven execution"
- Adapted for your context:
# e.g., "retailer impact first", "speed over perfection for <€50K decisions"
## Tone & Communication
- How we communicate: direct, data-backed, no hedging
- What language to use with different audiences (HQ, retailers, team)
- What's forbidden: corporate jargon, unsubstantiated claims, passive voice
## Risk Rules
- What requires human approval (pricing above X, commitments above Y)
- What AI can do autonomously (internal analysis, drafts, data synthesis)
- PII/confidential data handling rules
## Verification Requirements
- Financial numbers: always cross-check against source system
- Market claims: require at least 2 independent sources
- Recommendations to HQ: always get second opinion from different AI model
- Customer-facing content: human review mandatory
## Workflow
- Plan before act (always)
- Second opinion on important outputs
- Verify before declaring done
Forcing "Plan Before Act"
This is the hardest behavioral change. Here's what actually works:
Non-negotiable steps in my workflow
- Plan mode is default. For anything with 3+ steps or real consequences, AI enters plan mode first. It thinks through the approach, identifies risks, and presents the plan before doing anything. I approve or revise. Only then does execution start.
- "What's your plan?" is the first question. When someone brings me a task, I don't say "do it." I say "what's your plan?" If they don't have one, we make one. Same with AI — I never start with "do X." I start with "plan how to do X."
- Second opinion before shipping. Every important output gets reviewed by a different AI model. Not because the first one is wrong — but because different models catch different blind spots. This forces a pause between "I have output" and "I'm done."
- Kill the anti-pattern: "I asked ChatGPT and it said..." is the enemy. The output of a single AI query is a starting point, not an answer. The rule: if you're going to use AI output in a decision, you need to show the prompt, the output, and your judgment on top of it.
How to enforce with teams
- Make the plan a deliverable. Before the analysis, before the deck — I want the plan. One page: what are we trying to answer, what data do we need, what's the approach, what does "done" look like.
- Celebrate plans that changed. "I planned X but discovered Y and pivoted to Z" is the best outcome. It means the plan was useful.
- Punish paste-and-present. If someone presents AI output without visible thinking on top, send it back. "What's YOUR take on this? Where do you agree and disagree with the AI?"
First Skills to Standardize
Input: meeting notes (even rough ones), attendee list, context
Output: structured action items with owners, deadlines, and dependencies + follow-up email draft
Why first: everyone has meetings, everyone loses actions, the pain is universal and immediate. Time saved: 15-20 min per meeting × dozens of meetings per week across the team.
Input: promo parameters, actual results, benchmark data
Output: structured analysis (what worked, what didn't, why, recommendation for next time)
Why second: this is high-value, high-frequency in FMCG. Every promo cycle generates learnings that get lost. Making this a repeatable skill means institutional memory builds automatically.
Input: retailer profile, historical performance, current terms, your objectives
Output: negotiation brief with talking points, BATNA analysis, anticipated pushback + responses, recommended opening position
Why third: this directly impacts commercial outcomes. A well-prepped negotiation versus an underprepared one is worth real money.
How to keep skills reusable
- Each skill is a markdown file with clear steps, not a prompt in someone's head
- Skills are version-controlled — when someone improves one, everyone benefits
- Skills reference templates for output format — consistency matters
- Skills are invocable by name: "run the promo post-mortem" — not "hey AI, can you analyze this promo for me"
- Review and update skills quarterly based on what's actually being used
Why Claude Code Specifically
And why it sidesteps your Microsoft problem
It runs locally on your machine. It's a CLI tool — runs in your terminal, on your laptop. No cloud portal, no browser tab. It operates under YOUR user session with YOUR credentials. What you can access on your computer, Claude Code can access. Your files, your email (via MCP connectors), your calendar, your task manager, your data — if it's on your machine or reachable from it, AI can read it and act on it.
It's agentic, not conversational. This is the category difference. Copilot answers questions. Claude Code executes multi-step workflows autonomously. "Process this week's reports from 84 people, assess them against our 5 operating principles, identify company-wide themes, flag escalations, and generate an HTML dashboard" — and it does all of that. Reads files, calls APIs, writes outputs, reviews its own work, iterates. Not one prompt-response cycle — a full autonomous workflow.
It has persistent memory and skills. Every project has a CLAUDE.md file — essentially a constitution that tells Claude the context, goals, rules, and accumulated learnings. When I open a project, Claude already knows what we did last session, what decisions were made, what patterns to follow. Skills are reusable workflow templates — invoke by name, execute consistently. This is institutional memory that gets better over time.
It plays nice with Microsoft. Claude Code doesn't replace M365 — it orchestrates it. Through MCP servers (open protocol connectors), it can read Outlook, access SharePoint, query calendars. Your Microsoft data stays where it is. Claude Code just becomes the intelligent layer that connects it all. No migration, no replacement — just amplification.
Real Example: CEO Weekly Review
How I run a company of 1,200 people through AI weekly.
The input
Each person submits a weekly 5/15 report via Asana (takes them 15 minutes to write). That's it — their raw input into the system.
What AI does with it (fully automated)
- Fetches all 84 reports from Asana via API — no manual copy-pasting
- Per-person operating principles assessment — each person is evaluated against 5 operating principles (Extreme Ownership, Speed Over Comfort, Impact Obsessed, Simplify to Scale, Disciplined). Not a checkbox — a written assessment with specific evidence from their report.
- Department-level synthesis — 13 departments, each gets a section: key achievements, blockers, escalations, notable patterns
- Company-wide theme extraction — AI identifies cross-cutting themes that no single person can see.
- Escalation flagging — 31 items last week that need CEO attention, ranked and categorized
- AI adoption tracking — who's building with AI, who's using it for productivity, who hasn't started. The gap is widening weekly and the system makes it visible.
- HTML dashboard — interactive, department-by-department drill-down. I review the entire company in under 15 minutes.
- "AI adoption velocity is highly uneven across departments" — with a 3-tier breakdown
- "Data quality is the single biggest infrastructure bottleneck for AI scaling" — flagged by 6 independent teams
- "Sales force structure under fundamental pressure" — with 5 converging signals from different departments
What this gives me that no human analyst could
- No information loss. When a human summarizes 84 reports, they filter. AI reads every word and surfaces patterns across all of them.
- Cross-departmental pattern matching. 6 different teams in 6 different departments independently flagged data quality as their blocker. No human would catch that — AI does it instantly.
- Operating principles as a living measurement. Every person, every week, assessed against the same 5 principles. Over time: who's growing, who's plateauing, who needs intervention.
- Mood and energy detection. The language people use in their reports reveals a lot. AI picks up on tone shifts, frustration signals, energy drops.
- Speed. 84 reports, assessments, themes, dashboard — done in 30-40 minutes. A human team would need days.
What this means for you
You probably have 6-8 direct reports, each managing teams across DACH&CEE. If each submits a structured weekly input (15 minutes of their time), AI can give you:
- A synthesized view of your entire cluster every Monday morning
- Operating principles health check per leader
- Cross-market pattern matching (what's happening in DE that's also happening in CZ but nobody connected the dots?)
- Early warning signals on team energy and execution quality
- Automated follow-up tracking (who committed to what last week, and did it happen?)
The Meta-Point
The biggest unlock isn't any single tool or technique. It's the shift from "AI as a tool I use sometimes" to "AI as an operating layer that's always on."
My CoS doesn't wait for me to ask questions. It processes my inputs, tracks my projects, remembers my preferences, and builds institutional memory across sessions. Every correction I make, it learns from. Every project I finish, it carries the learnings forward.
The starting point isn't technology. It's one person (you) deciding to run their own operating rhythm through AI for 2 weeks, proving the value with real outputs, and then expanding from there.
different