Skip to content

Structured MCP diagnostics, logging consistency, and analyzer enforcement#71

Open
TomProkop wants to merge 36 commits into
masterfrom
feat/mcp-failure-details
Open

Structured MCP diagnostics, logging consistency, and analyzer enforcement#71
TomProkop wants to merge 36 commits into
masterfrom
feat/mcp-failure-details

Conversation

@TomProkop
Copy link
Copy Markdown
Member

@TomProkop TomProkop commented May 4, 2026

Summary

This PR started as MCP failure diagnostics and evolved into a comprehensive logging/error-handling consistency pass with OTel telemetry — prerequisite cleanup before rich telemetry insights.

Structured MCP execution logs

  • get_execution_log tool returns filterable, searchable, paginated structured logs per tool invocation
  • Logs stored for both success and failure (previously only failures had details)
  • RedactedLogEntry model preserves timestamp, level, category, message, and structured data
  • Removed flat FullLog dead code paths from McpLogForwarder and CliSubprocessResult

Runtime behavior fixes (Milestone 1)

  • Silent exception swallowing: PublicSkillLoader now logs assembly load failures at Debug level
  • Console.Error bypass: InMemoryChangesetStore replaced Console.Error.WriteLine with DI-injected ILogger
  • Unredacted HTTP 500: DataTransformationServer now logs and redacts exception messages
  • Logger categories: standardized to nameof() in DocsListCliCommand, DocsShowCliCommand
  • Per-method logger creation: consolidated in DataverseDeploymentDetailService to single static field
  • Dead code: removed commented-out Console.WriteLine progress calls from DataModelConverterService
  • Timestamp correlation: console formatter now uses DateTimeOffset.Now with UTC offset
  • Exit code constant: TransformServerStartCliCommand uses ExitSuccess instead of raw 0

BannedSymbols enforcement (Milestone 2)

  • Added Console.Error / Console.Out property bans — catches the TextWriter.WriteLine bypass that method-level bans missed
  • Removed broad project-level BannedSymbols.txt opt-outs from Core, MCP, and Logging .csproj files
  • Added surgical #pragma RS0030 exemptions only in sanctioned infrastructure files

Roslyn analyzer guardrails (Milestone 3)

  • TXC014: no string interpolation in logger calls (existing, zero violations)
  • TXC015: no silent exception swallowing — flags broad catch (Exception) blocks. Severity: Warning
  • TXC016: CreateLogger must use nameof() — zero violations after M1 fixes

OpenTelemetry with Azure Monitor (Milestone 4)

  • TxcTelemetry: shared ActivitySource("TALXIS.CLI") with layered connection string resolution (env var → config → build-time assembly metadata)
  • TxcTelemetrySetup: initializes TracerProvider with Azure.Monitor.OpenTelemetry.Exporter, gated behind TELEMETRY_ENABLED (Release only)
  • Three-level opt-in: build config (#if TELEMETRY_ENABLED) + user setting (telemetry.enabled) + env var (TXC_TELEMETRY_OPTOUT)
  • CLI instrumentation: Activity spans on every TxcLeafCommand.RunAsync with txc.command, txc.exit_code, txc.error_kind, exception events
  • MCP instrumentation: Activity spans on every CallToolAsync with txc.tool
  • Connection string: injected by pipeline via -p:TxcTelemetryConnectionString=..., embedded as AssemblyMetadataAttribute — never in source code
  • Debug builds: zero telemetry overhead, no exporter loaded

Validation

  • dotnet build TALXIS.CLI.sln — 0 errors
  • dotnet test TALXIS.CLI.sln — 494 unit + 27 integration tests pass (5 skipped, unchanged)

Pipeline setup required

To enable telemetry in published NuGet packages, add to your CI build step:

dotnet build -c Release -p:TxcTelemetryConnectionString="InstrumentationKey=...;IngestionEndpoint=..."

Store the connection string as a pipeline secret/variable.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves txc MCP failure diagnostics by keeping the immediate tool failure response concise while attaching a fetchable, structured diagnostics resource (JSON). It also centralizes MCP result construction in a dedicated factory and adds test coverage for the new resource-based failure-details flow.

Changes:

  • Introduced McpToolResultFactory to build MCP CallToolResult responses and expose failure diagnostics as MCP resources.
  • Updated ToolLogStore to store structured failure details (exit code, summary, error summary, full log) and serialize them as JSON resources.
  • Added unit + integration tests validating that failed tool calls return a readable resource_link and that the linked resource can be fetched and parsed.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/TALXIS.CLI.Tests/MCP/ToolLogStoreTests.cs Updated tests for the new StoreFailure API and expanded assertions for stored failure metadata.
tests/TALXIS.CLI.Tests/MCP/McpToolResultFactoryTests.cs Added unit tests verifying failure and exception results attach a readable JSON diagnostics resource.
tests/TALXIS.CLI.IntegrationTests/McpTests.cs Added integration coverage ensuring a failed tool call exposes a fetchable failure-details resource.
tests/TALXIS.CLI.IntegrationTests/McpTestClient.cs Extended test client wrapper with resources list/read APIs.
src/TALXIS.CLI.MCP/ToolLogStore.cs Refactored store entries to represent failure details and added JSON serialization for resource reads.
src/TALXIS.CLI.MCP/Program.cs Switched tool result building + resource handlers to use McpToolResultFactory.
src/TALXIS.CLI.MCP/McpToolResultFactory.cs New factory encapsulating MCP success/failure result assembly and resource exposure.
src/TALXIS.CLI.MCP/CliSubprocessRunner.cs Added an internal constructor to support explicit stdout/stderr-derived diagnostics in tests.
global.json Added rollForward: latestFeature for the pinned .NET SDK version.
Comments suppressed due to low confidence (1)

src/TALXIS.CLI.MCP/Program.cs:767

  • These resource handler comments still mention "execution logs" / "full execution log", but the implementation now lists/reads structured failure-details JSON resources via McpToolResultFactory. Updating the comments will keep the MCP resource contract clear (especially the MIME type/content expectations).
// MCP resource listing — exposes stored tool execution logs
ValueTask<ListResourcesResult> ListResourcesAsync(RequestContext<ListResourcesRequestParams> ctx, CancellationToken ct)
{
    return ValueTask.FromResult(new ListResourcesResult { Resources = toolResultFactory.BuildResources() });
}

// MCP resource read — returns the full execution log for a given URI
ValueTask<ReadResourceResult> ReadResourceAsync(RequestContext<ReadResourceRequestParams> ctx, CancellationToken ct)

Comment thread src/TALXIS.CLI.MCP/Program.cs Outdated
Comment thread tests/TALXIS.CLI.Tests/MCP/ToolLogStoreTests.cs Outdated
Comment thread src/TALXIS.CLI.MCP/ToolLogStore.cs Outdated
@TomProkop TomProkop changed the title Improve MCP failure diagnostics Structured MCP diagnostics, logging consistency, and analyzer enforcement May 17, 2026
@TomProkop TomProkop requested a review from Copilot May 17, 2026 18:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 54 out of 54 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/Directory.Build.props:35

  • Directory.Build.props comment says infrastructure projects (Logging/MCP/Core) opt out of the banned API analyzer via their .csproj, but this PR removes those opt-outs and relies on targeted #pragma exemptions instead. Update this comment to reflect the new enforcement approach so future contributors don't reintroduce broad opt-outs.
  <ItemGroup>
    <!-- Banned API analyzer: prevents Console.Write*/ReadKey/ReadLine in command code.
         Infrastructure projects (Logging, MCP, Core) opt out via their own .csproj. -->
    <PackageReference Include="Microsoft.CodeAnalysis.BannedApiAnalyzers" Version="3.3.4">

Comment thread src/TALXIS.CLI.MCP/ToolLogStore.cs
Comment thread src/TALXIS.CLI.MCP/Program.cs
Comment thread src/TALXIS.CLI/Program.cs Outdated
Comment thread src/TALXIS.CLI.MCP/Program.cs Outdated
Comment thread src/TALXIS.CLI.Logging/TxcTelemetrySetup.cs
Comment thread src/TALXIS.CLI.Logging/TxcTelemetry.cs Outdated
@TomProkop
Copy link
Copy Markdown
Member Author

All review comments addressed in commit 63fb019.

The two remaining items (ToJson duplication, test exit-code-0) were about earlier revisions that have already been resolved:

  • ToJson duplication: The FailureDetailsDocument class was removed in an earlier commit. Current ToJson() is JsonSerializer.Serialize(this, ...) — a single DTO as requested.
  • Test exit code 0: All Store() test calls already use exit code 1. The Summary fallback was fixed to be conditional (completed successfully vs failed with exit code).

@TomProkop TomProkop force-pushed the feat/mcp-failure-details branch from bcc0ff3 to 275a617 Compare May 18, 2026 11:03
TomProkop and others added 21 commits May 18, 2026 18:07
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generalize failure-specific MCP surface to a general execution-log retrieval tool.
Rename ToolLogStore.StoreFailure to Store, update Kind to tool-execution-log,
rename McpToolResultFactory methods, update all guidance text and tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace flat-string log storage with structured RedactedLogEntry arrays
that preserve timestamp, level, category, message, and data dict from
each JsonLogLine. Remove all dead FullLog/fullLogBuffer code paths.

Key changes:
- McpLogForwarder collects List<RedactedLogEntry> instead of StringBuilder
- ToolLogStore.LogEntry holds IReadOnlyList<RedactedLogEntry>
- CliSubprocessResult carries StructuredEntries (FullLog removed)
- get_execution_log supports level/category/search filters + skip/take paging
- All executions stored (success + failure), not just failures
- McpLogForwarder uses semantic JsonLogLine.Progress when available
- New TXC014 Roslyn analyzer: no string interpolation in ILogger calls
- Progress logging added to SolutionImporter and DeploymentDetailService
- output-contract.md documents structured logging, TXC014, progress expectations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… directly

Replace the ISubprocessOutputHandler parameter with the concrete
McpLogForwarder type to make the actual dependency explicit and avoid
silent data loss if a different handler implementation were passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ails

Milestone 1 — Runtime behavior fixes:
- Fix silent exception swallowing in PublicSkillLoader (now logs at Debug)
- Replace Console.Error.WriteLine bypass in InMemoryChangesetStore with ILogger
- Log and redact exceptions in DataTransformationServer HTTP 500 responses
- Standardize logger categories: nameof() in DocsListCliCommand, DocsShowCliCommand
- Consolidate per-method logger creation in DataverseDeploymentDetailService
- Replace raw return 0 with ExitSuccess in TransformServerStartCliCommand
- Remove dead commented-out Console.WriteLine progress code from DataModelConverterService
- Add UTC offset to console timestamps (DateTimeOffset.Now) for log correlation

Milestone 2 — BannedSymbols enforcement:
- Ban Console.Error/Console.Out property access (catches TextWriter bypass)
- Remove broad project-level BannedSymbols opt-outs from Core, MCP, Logging
- Add surgical #pragma RS0030 exemptions in sanctioned infrastructure files

Milestone 3 — Roslyn analyzer guardrails:
- TXC015: flag broad catch blocks that silently swallow exceptions
- TXC016: enforce nameof() in CreateLogger calls for consistent categories

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TxcTelemetry: shared ActivitySource (TALXIS.CLI) with connection string
  resolution (env var > config > build-time assembly metadata)
- TxcTelemetrySetup: initializes TracerProvider with Azure Monitor exporter,
  gated behind TELEMETRY_ENABLED (Release builds only)
- Three-level opt-in: build config + telemetry.enabled + TXC_TELEMETRY_OPTOUT
- CLI: Activity spans on every TxcLeafCommand.RunAsync with exit code,
  error kind, and exception events
- MCP: Activity spans on every tool call via CallToolAsync
- Connection string injected by pipeline: -p:TxcTelemetryConnectionString=...
  embedded as AssemblyMetadataAttribute (never in source code)
- TelemetrySettings extended with ConnectionString property
- Azure.Monitor.OpenTelemetry.Exporter added to Logging project

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MCP-spawned CLI subprocesses now get TXC_TELEMETRY_OPTOUT=1 in their
environment. Without this, both the MCP server span and the CLI
subprocess span would independently export to App Insights — producing
duplicate uncorrelated traces for the same logical tool invocation.

The MCP server's Activity span already captures tool name, exit code,
and errors. The subprocess's detailed execution is captured via
structured log entries (RedactedLogEntry).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the subprocess telemetry opt-out with proper W3C trace context
propagation. When the MCP server spawns a CLI subprocess:

1. MCP sets TXC_TRACEPARENT env var with the current Activity's traceId
   and spanId in W3C traceparent format (00-traceId-spanId-flags)
2. CLI reads TXC_TRACEPARENT in TxcLeafCommand.RunAsync and creates its
   Activity as a child of the MCP parent span

Result in App Insights:
- MCP tool dispatch span (parent) → CLI command execution span (child)
- Single correlated trace regardless of entry point
- Standalone CLI creates root spans (no TXC_TRACEPARENT = no parent)
- Both MCP and CLI spans have identical structure (txc.command,
  txc.exit_code, txc.error_kind) for unified analysis

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass TXC_TELEMETRY_CONNECTION_STRING secret to dotnet pack via the
TxcTelemetryConnectionString MSBuild property. This embeds the App
Insights connection string as assembly metadata in published NuGet
packages so installed tools report telemetry when users opt in.

The secret is set as a GitHub Actions repository secret.
Debug/local builds remain unaffected (no connection string = no export).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TelemetrySettings.Enabled defaults to true (was false)
- Removed TXC_TELEMETRY_OPTOUT env var — telemetry always flows
- Added CI detection (CI, TF_BUILD, GITHUB_ACTIONS) as a span tag
  for filtering pipeline vs interactive usage in App Insights
- Only way to disable: txc config setting set telemetry.enabled false

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix stale comment: ToolLogStore stores structured diagnostics, not just logs
- Fix stale comment: BannedSymbols now uses #pragma exemptions, not project opt-outs
- Fix skip/take parsing: use TryGetInt32 with Math.Max/Math.Clamp instead of GetDouble cast
- Fix Summary fallback: success message for exit code 0, failure message otherwise
- Fix silent catch blocks: add explicit 'return' and typed 'catch (Exception)' in
  telemetry init (CLI, MCP, TxcTelemetrySetup) to satisfy TXC015 intent
- PR description already matches actual behavior (no opt-out env var)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The #if TELEMETRY_ENABLED block needs explicit using statements for
OpenTelemetry.Trace, OpenTelemetry.Resources, and Azure.Monitor
extension methods that are not available in Debug builds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Change MCP tool dispatch and CLI command spans from ActivityKind.Internal
  to ActivityKind.Server so they land in the 'requests' table instead of
  'dependencies'
- Add ActivityKind.Client span for MCP→CLI subprocess dispatch so the
  process boundary is visible as a dependency in App Insights
- Add OpenTelemetry.Instrumentation.Http to auto-capture all outbound HTTP
  calls (Dataverse, Graph, etc.) as dependencies
- Replace manual ActivityEvent('exception') with RecordException() so
  exceptions get full stack traces in the 'exceptions' table
- Refactor diagnostics URI from txc://logs/{toolName}/{runId} to
  txc://logs/{executionId} where executionId = root Activity trace id =
  App Insights operation_Id — one canonical identity for diagnostics,
  telemetry, and troubleshooting
- Add OpenTelemetry.Api to Core for RecordException extension method
- Update ToolLogStore tests for new URI format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Upgrade Azure.Monitor.OpenTelemetry.Exporter from 1.3.0 to 1.8.0
- Upgrade OpenTelemetry.Instrumentation.Http from 1.8.1 to 1.15.1
- Upgrade OpenTelemetry.Api from 1.8.1 to 1.15.3
- Add explicit ForceFlush(5s) in Shutdown() before Dispose() — the batch
  exporter fires every 5s by default so short-lived CLI processes would
  silently lose all spans without an explicit flush
- Fixes TypeLoadException caused by version mismatch between OTel SDK and
  API packages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add RecordException(ex) to validation and cancellation catch blocks so
  all failures produce rows in the App Insights exceptions table with
  full stack traces — not just internal errors
- Remove SamplingRatio override; defer volume control to future iteration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ILogger is now the single diagnostic entry point for all error recording.
Console, MCP forwarder, and OTel telemetry all consume the same events:

- New TxcTelemetryLogProvider bridges ILogger → Activity spans:
  - Maps txc.* structured properties to Activity tags
  - Records exceptions with redaction via LogRedactionFilter
  - Sets Activity error status from log level
- Simplified TxcLeafCommand catch blocks: removed all direct Activity
  tag/status/RecordException calls, replaced with single LogCommandFailure
  helper that passes structured properties through ILogger
- Removed OpenTelemetry.Api dependency from Core project (no longer needed)
- MCP dispatch-layer exceptions now logged via ILogger before returning
  BuildExceptionResult (previously invisible to both ILogger and OTel)
- Fixes redaction gap: raw exceptions no longer reach App Insights

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ProfiledCliCommand.PreExecuteAsync now resolves the profile early and
tags the Activity span with OTel semantic conventions:
- enduser.id: UPN (email) for interactive users, app ID for service principals
- enduser.scope: Entra tenant ID for multi-tenant correlation
- txc.profile: resolved profile name
- txc.connection: connection identifier

Identity tagging is best-effort — resolution failures are silently caught
so commands are never blocked for telemetry reasons.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace useless txc.profile/txc.connection tags with actionable identity:
- enduser.id: UPN for interactive users (falls back to credential ID)
- enduser.scope: tenant ID from connection (target tenant, not home)
- txc.environment_url: Dataverse instance URL being operated on

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TomProkop and others added 3 commits May 18, 2026 18:15
- enduser.id: Entra object ID (GUID) from MSAL HomeAccountId — stable,
  join-friendly identity for analytics
- user.name: UPN (email) for human-readable dashboards
- enduser.scope: tenant ID (unchanged)
- txc.environment_url: Dataverse instance URL (unchanged)
- txc.environment_name: human-readable environment display name (new)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Azure Monitor exporter's default adaptive sampling was dropping 75%
of spans client-side before they reached App Insights. Server-side
ingestion sampling is already configured to 'All data (100%)' in Azure
Portal, so the SDK sampling was the only source of data loss.

Setting SamplingRatio=1.0 ensures every CLI/MCP invocation is exported,
which is required for employee activity tracking.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Resolve active profile identity in TxcLeafCommand.RunAsync so ALL
  commands (not just ProfiledCliCommand) are attributable to a user
  when an active profile is set
- Tag txc.version on both CLI and MCP spans for version tracking
- Identity tags: enduser.id (Entra GUID), user.name (UPN),
  enduser.scope (tenant), txc.environment_url, txc.environment_name
- Best-effort: identity resolution failures never block commands

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TomProkop TomProkop force-pushed the feat/mcp-failure-details branch from 5c60f80 to ed116d6 Compare May 18, 2026 16:17
- Use AssemblyVersion (e.g. '1.11.0') instead of InformationalVersion
  which appends the git commit hash and is noisy in App Insights
- Set TracesPerSecond=null to disable the Azure Monitor exporter's
  rate-limited sampling which overrides SamplingRatio (confirmed via
  Microsoft Learn docs: TracesPerSecond takes precedence when set)
- Store TracerProvider as typed field for reliable ForceFlush

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TomProkop TomProkop force-pushed the feat/mcp-failure-details branch from ed116d6 to 1c862d3 Compare May 18, 2026 19:50
TomProkop and others added 10 commits May 18, 2026 22:21
- Set url.full to txc://cli/{command} or txc://mcp/{tool} so App Insights
  request blade shows meaningful URLs instead of <empty>
- Map exit codes to HTTP response codes (0→200, 1→500, 2→400) so App
  Insights failure detection and response code filtering works naturally
- Fix MCP version tag to use ToString(3) for clean version string

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove orphaned XML summary blocks from deleted SetResponseCode method
- Remove duplicate LogCommandFailure summary comment
- Remove dead BeginScope/ScopeDisposable/AsyncLocal from TxcTelemetryLogProvider
  (scope-based approach was abandoned in favor of direct Activity.SetTag)
- Extract shared TagActivityWithIdentity helper in TxcLeafCommand, remove
  duplicate identity-tagging logic and ExtractObjectId from ProfiledCliCommand
- Remove trivial pass-through TxcTelemetry.ShouldEnable, inline at call site
- Remove duplicate GetCliVersion — use ActivitySource version instead
- Remove unused url.full/http.response.status_code tags from MCP Program.cs
  (Azure Monitor exporter only populates those fields from HTTP instrumentation)
- Add version parameter to TxcLeafCommand's ActivitySource for consistency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HTTP client auto-instrumentation was capturing the exporter's own
POST /v2.1/track calls to applicationinsights.azure.com, creating noise
in the dependencies table. Filter those out by host name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Major architectural cleanup for multi-host support (CLI, MCP, future REST API):

- TxcTelemetryTags: typed constants replace all magic string tag keys
- CommandActivityScope: host-agnostic span lifecycle — Activity creation,
  W3C traceparent restoration, exit-code tagging in single finally block.
  Hosts pass entryPoint ('cli'/'mcp'/'api'), scope handles the rest.
- ActivityIdentityTagger: DI-registered service for user/tenant/environment
  identity tagging. Constructor-injected via TxcServicesBootstrap.
  Extensible for future REST API auth context.
- Wire DotMake DI bridge: Cli.Ext.SetServiceProvider(TxcServices.Provider)
  enables constructor injection for new services. Existing TxcServices.Get
  calls still work — incremental migration path.
- Remove Logging→Core vestigial project reference (zero actual imports)
- TxcLeafCommand shrinks from 385 → 249 LOC:
  - Span lifecycle → CommandActivityScope
  - Identity tagging → ActivityIdentityTagger
  - Tag keys → TxcTelemetryTags
  - Exit code → single finally block
  - Traceparent parsing → CommandActivityScope.ParseTraceparent
  - Removed: LogCommandFailure, RestoreParentTraceContext,
    TagActivityWithIdentity, ExtractObjectId, GetCliVersion (all moved)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename user.name → enduser.name for consistency with enduser.* namespace
- Console timestamps now show local time (HH:mm:ss) without UTC offset

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The OTel enduser.* semantic convention attributes (enduser.id, enduser.name,
enduser.scope) are mapped by the Azure Monitor exporter to App Insights
built-in fields and stripped from customDimensions, making them invisible
in Kusto queries. Using txc.user_id, txc.user_name, txc.tenant_id ensures
they always appear in customDimensions for reliable querying.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Set OTel enduser.* attributes for App Insights built-in features (Users
blade, user_Id column) AND txc.* duplicates for reliable Kusto queries
in customDimensions. The exporter maps enduser.* to built-in fields and
strips them from customDimensions, so both are needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The MCP Server span (execute_operation) was staying success=true even
when the CLI subprocess failed. Now:
- txc.exit_code set on the MCP Server span from subprocess result
- Error status set on the Server span for non-zero exit
- Failure logged via ILogger → TxcTelemetryLogProvider → exceptions table
- Client dispatch span scoped explicitly so Activity.Current is the
  Server span when tagging (not the Client span)
- MCP wire response (IsError, content) unchanged per MCP spec

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ess dependency

- Service name now uses entryPoint: talxis-cli for CLI, talxis-mcp for
  MCP server. Shows correctly in App Insights Application Map.
- Subprocess Client span sets peer.service=talxis-cli so App Insights
  shows the target service name instead of 'OTHER' in dependency type.
- Exception gap for very short-lived subprocesses is an inherent
  limitation of batch exporters — the subprocess exception IS recorded
  on the CLI span (same operation_Id) but may not flush in time for
  sub-100ms commands. Longer-running commands flush reliably.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Telemetry is unconditionally active in Release builds. The only gate is
whether a connection string is available (embedded at build time or via
environment variable). Removed:
- TelemetrySettings.Enabled property
- telemetry.enabled setting registration
- configEnabled parameter from TxcTelemetrySetup.Initialize
- Related tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 62 out of 62 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

src/TALXIS.CLI.MCP/ToolLogStore.cs:54

  • ToolLogStore.Store() overwrites existing entries for the same diagnostics URI but still enqueues the URI again. If the same executionId is ever reused (e.g., trace id collision or a caller-provided executionId), the FIFO eviction queue can later dequeue the old URI and remove the new entry from _logs, causing unexpected loss of the most recent log. Consider either rejecting/rehashing collisions, generating a unique suffix when uri already exists, or not enqueuing when updating an existing key (and keeping eviction order consistent).

Comment on lines +773 to +777
["uri"] = new Dictionary<string, object?>
{
["type"] = "string",
["description"] = "Diagnostics URI returned by a failed tool call (e.g. 'txc://logs/workspace_validate/abc123')."
},
Comment on lines +27 to +41
/// <summary>
/// Initializes telemetry if all gates pass (build config, user opt-in, env var).
/// Safe to call multiple times — subsequent calls are no-ops.
/// </summary>
/// <param name="configEnabled">Value of <c>telemetry.enabled</c> from config file.</param>
/// <param name="configConnectionString">Optional connection string override from config file.</param>
/// <param name="entryPoint">Identifies the host: "cli" or "mcp".</param>
/// <summary>
/// Initializes telemetry. Always on in Release builds — the only gate is
/// whether a connection string is available (embedded at build time or
/// set via environment variable).
/// </summary>
/// <param name="configConnectionString">Optional connection string override from config file.</param>
/// <param name="entryPoint">Identifies the host: "cli" or "mcp".</param>
public static void Initialize(string? configConnectionString = null, string entryPoint = "cli")
Comment on lines +10 to +14
/// Telemetry is on by default for published builds:
/// <list type="number">
/// <item><b>Build-time gate:</b> <c>#if TELEMETRY_ENABLED</c> (Release builds only — Debug/local never emits).</item>
/// <item><b>On by default:</b> <c>telemetry.enabled = true</c> in <c>~/.txc/config.json</c>.</item>
/// </list>
Comment on lines +12 to +19
/// <para>Behavior:</para>
/// <list type="bullet">
/// <item>Structured properties prefixed with <c>txc.</c> are mapped to Activity tags.</item>
/// <item><see cref="LogLevel.Error"/> and <see cref="LogLevel.Critical"/> events with
/// an exception produce OTel exception events (→ App Insights <c>exceptions</c> table).</item>
/// <item>All text is redacted via <see cref="LogRedactionFilter"/> before reaching the Activity.</item>
/// <item>When no Activity is current, all operations are no-ops (zero overhead).</item>
/// </list>
Comment thread src/TALXIS.CLI.Core/Telemetry/ActivityIdentityTagger.cs
Comment on lines 14 to 28
public static readonly IReadOnlyList<SettingDescriptor> All = new SettingDescriptor[]
{
new(
"log.level",
"Minimum log level for diagnostic output (stderr).",
new[] { "trace", "debug", "information", "warning", "error", "critical", "none" },
g => g.Log.Level,
(g, v) => g.Log.Level = v),
new(
"log.format",
"Diagnostic log rendering format.",
new[] { "plain", "json" },
g => g.Log.Format,
(g, v) => g.Log.Format = v),
new(
"telemetry.enabled",
"Whether anonymous usage telemetry is enabled.",
null,
g => g.Telemetry.Enabled ? "true" : "false",
(g, v) => g.Telemetry.Enabled = ParseBool(v)),
};
Comment thread src/TALXIS.CLI/Program.cs Outdated
Comment on lines +50 to +57
private static void InitializeTelemetry()
{
try
{
var configStore = TxcServices.Get<IGlobalConfigStore>();
#pragma warning disable RS0030 // Synchronous telemetry init before async main loop — cannot use await here
var config = configStore.LoadAsync(CancellationToken.None).GetAwaiter().GetResult();
#pragma warning restore RS0030
Comment thread src/TALXIS.CLI.MCP/Program.cs Outdated
Comment on lines +939 to +947
void InitializeMcpTelemetry()
{
try
{
TALXIS.CLI.Platform.Dataverse.Application.DependencyInjection.TxcServicesBootstrap.EnsureInitialized();
var configStore = TALXIS.CLI.Core.DependencyInjection.TxcServices.Get<TALXIS.CLI.Core.Abstractions.IGlobalConfigStore>();
#pragma warning disable RS0030 // Synchronous telemetry init in top-level MCP setup — before async host.RunAsync
var config = configStore.LoadAsync(CancellationToken.None).GetAwaiter().GetResult();
#pragma warning restore RS0030
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants