OpenTelemetry
OpenTelemetry is an API and SDK to achieve Telemetry no matter of the language / framework and cloud platform.
Overview
There are three parts to be collected by OpenTelemetry
Traces (or Spans)
Metrics
Logs
The process is:
in a system you collect data (language SDK)
you post this into an endpoint (which can be a collector or server supporting OpenTelemetry data like Jaeger )
each part is posted to a specific service or endpoint
[ TODO expand a bit more into some terminology of the OpenTelemetry ]
Developer
PHP
SDK
the composer package is:
open-telemetry/exporter-otlp
That will install the other packages like API and SDK but if you are a purist you can add all this packages
open-telemetry/api
open-telemetry/sdk
open-telemetry/exporter-otlp
If you need GRPC you can add:
open-telemetry/transport-grpc
But this needs extra libraries so for standard php setup, just add the first and HTTP transport will be used
Classes
You need this core class to help with requests, this is for REST, for other things like Minion / Microservices there will require slight changes
Start the tracer
The best is to add the tracer at the beginning of the index.php
so you can use something like this:
Note that you can enable / disable through some configuration for this; this should be enabled on dev, but stage and production will remain to confirm since needs to be checked the extra usage of resources.
This will get a main trace that will calculate all the time of the processing with some extra metadata, for example:
[ TODO add XRay example ]
Adding more information
There are some extra ways to provide more information about the trace:
Events
You can add an event, which is a log event at any time with following code:
This will add a Logs section in the trace
Exceptions
Similar to events, but targetted to add exception data
This will add a stack trace with the exception details
Subtraces
A subtrace provides similar information to main trace, and is intended to record start and end of an operation, so could be good to add into following parts:
Resource heavy operations / functions
Network operations (REST calls, Database, Cache, CURL, etc)
A subtrace will have something like this:
This will add information about the start and end of this operation which will be helpful to check for performance, but also to debug if is possible
Laravel Tracing
Laravel can be added some tracing, for now there are two options: DB and Cache
DB
Add into the AppServiceProvider.php
This will record the time used for a SQL call through an event, ideally should be a subtrace to understand the time used, but this is the simple way for now to handle
Cache
This is a more hacky way, so probably will be good to only enable through a global setting (probably separate from enabling the tracing)
Add into the AppServiceProvider.php
The CacheContractInvoker
file is:
OpenTelemetry More Info
Some extra information can be found on the github of OpenTelemetry:
Also OpenTelemetry docs page can be a good resource, but is very extensive Documentation
Resources
Dev
Resource | Value |
---|---|
| 172.16.11.201 |
Naming IDs
Following IDs should be used to find able properly the traces
Projects
Replace PROJECT_ENV
variable for this
Project | ID |
---|---|
Showtime CMS |
|
Showtime Minion |
|
Fabric API |
|
Fabric Auth |
|
Fabric Minion |
|
Traces
Replace TRACER_ID
for any of this
Type | ID |
---|---|
Website (dynamic) |
|
Rest API |
|
Minion Task |
|
Devops
AWS
Aws doesn’t not support natively OpenTelemetry so you need a collector server to export into proper AWS services (X-Ray and Cloudwatch)
They have a server called AWS Distro for OpenTelemetry Collector (ADOT), setup instructions are here:
EC2 server seems the best option for now, the steps are:
Create the
AWSDistroOpenTelemetryRole
Launch a cloudformation template using this stack: https://bitbucket.org/touchcastllc/daniel-internal-scripts/src/develop/otel-devops/template/otel-cf-template.yaml
s3 url is: https://cf-templates-13xq5vrflgsjx-us-east-1.s3.us-east-1.amazonaws.com/otel-cf-template.yamlSet the proper values for the template, the default are for dev launch
[TODO] update the Route53 subdomain with the internal ip of the EC2 server
Because Collector should be “stateless” it can be created a Autoscale group for this
[ TODO check if ECS in Fargate with ALB or similar is better than EC2, examples only show how to add as a sidecar not as a service ]
Pending Work
Distributed tracing is pending, so how to correlate the traces from ALB and Cloudfront into this and also if I call another service how can I correlate it; also how UI will work