refactor integration tests and add metrics coverage #2432

scottgerring · 2024-12-15T16:32:38Z

This is an alternate to #2424 , with a more of an opinionated take on how the integration tests could look. Essentially I am trying to make it into a more normal looking test suite, so it is easier to extend and reason about the results of.

Decompose the single "mega unit test" in integration_test.rs into discrete unit tests
Pull the otel-collector container out into test_utils.rs and re-use it everywhere
Introduce anyhow to make error handling cleaner and easier to follow panic output when it happens
Removed the #[ignore] from all the integration tests - if we don't want them to run as part of cargo test, we can pass --lib in our CI scripts
Modified the collector image so it outputs a separate log file for each signal
Upgraded testcontainers and added startup waits so that we don't have to sleep waiting for the container to start and potentially run over our startup time budget

This saves a fair bit of duplication too, which is nice!

In metrics i've added test-per-meter and some supporting code to pluck the data out for each meter easily. IMHO this will make it easy to extend and easy to follow, rather than just having a enormous "total world" diff to pick through.

I've adapted traces/logs so each has a single normal unit test in its own file, but have not decomposed them or introduced more tests there, yet.

Changes

Please provide a brief description of the changes here.

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

scottgerring · 2024-12-15T16:44:14Z

@cijothomas @lalitb let me know what you think about this; I sunk a bit of time into this this weekend to try organise the integration tests so they are a bit more "ordinary".

If we're happy with this i'll fill in the gaps in the coverage off of here (as well as other comments i've not got to from the other branch) !

opentelemetry-otlp/tests/integration_test/expected/metrics/test_up_down_meter.json

opentelemetry-otlp/tests/integration_test/otel-collector-config-2.yaml

opentelemetry-otlp/tests/integration_test/tests/logs.rs

opentelemetry-otlp/tests/integration_test/tests/metrics.rs

cijothomas

I like this refactor! Thanks for working on this.

(I also tested locally and confirmed that https://github.com/open-telemetry/opentelemetry-rust/pull/2431/files is solving the issue it is intending to fix)

Not marking explicit approval as the PR is marked draft. Feel free to make it review ready.

lalitb · 2024-12-16T04:59:47Z

Removed the #[ignore] from all the integration tests - if we don't want them to run as part of cargo test, we can pass --lib in our CI scripts

The reason for ignoring integration tests by default is that they can take a long time to run, so cargo test should not execute them unless explicitly specified.

lalitb · 2024-12-16T05:03:19Z

opentelemetry-otlp/tests/integration_test/lcov.info

nit - Do you need this file, as it is bringing ~38K loc, this will increase the clone time for repo.?

lalitb · 2024-12-16T05:40:22Z

opentelemetry-otlp/tests/integration_test/src/metrics_asserter.rs

-    results: Vec<ResourceMetrics>,
-    expected: Vec<ResourceMetrics>,
+    results: Value,
+    expected: Value,


The earlier code was also testing the conversion of JSON to proto structs, while it seems the current code no longer does this?

The grpc_build.rs adds the annotations for serialization/deserialization between json and proto structs, and this got tested in current setup.

do we need to test that?

See we comment in the big thing at the bottom - there's an issue with the roundtrip serialization of metrics, so I wrote an additional test that shows this, marked it as #[ignore], and raised an issue. I've switched this to serde types so that the integration test for metrics will actually fail if there's a diff; as it stood differences in the metric value would not be detected.

I am totally fine to just get an ACK from Collector that it accepted our metrics/logs/traces. That will give instant value (to validate lot of stuff we are manually validating now, like is the shutdown going to panic or do its job etc.!)

Validating the actual content - Its important, but I consider it as non-blocking now, and can be added later too.

Just to be clear: we're getting an ACK from the collector, and parsing all the results back out from the copy written by the collector to its file outputs, and comparing them to our expectation - we're validating the data that comes out of the collector is what we sent it now ✅

The only "thing" is, we're just using Serde models to deserialize for that validation, not our own proto-derived ones, because of the aforementioned fields-going-missing issue that would make a proto-based-deserialization test succeed when data is actually lost.

So - i think this is in a pretty good state 🙏 !

do we need to test that?

Yes it's important to test from opentelemetry-proto prospective that we can successfully deserialize the metrics data written by collector. This is a standalone crate, and also being used by users for consumers for collector. And the only way to test this reliably is with integration tests. This is not the blocker, but we should bring it back once the serde model is fixed for metrics.

nit - Please add a TODO (with reference to the metrics serde issue), to be fixed eventually.

cijothomas · 2024-12-16T06:18:41Z

Removed the #[ignore] from all the integration tests - if we don't want them to run as part of cargo test, we can pass --lib in our CI scripts

The reason for ignoring integration tests by default is that they can take a long time to run, so cargo test should not execute them unless explicitly specified.

if it is executing in parallel to the main CI, and is taking less time than the longest CI (which I think is the windows ci), then l think we can let integration test run always.

lalitb · 2024-12-16T06:26:33Z

Removed the #[ignore] from all the integration tests - if we don't want them to run as part of cargo test, we can pass --lib in our CI scripts

The reason for ignoring integration tests by default is that they can take a long time to run, so cargo test should not execute them unless explicitly specified.

if it is executing in parallel to the main CI, and is taking less time than the longest CI (which I think is the windows ci), then l think we can let integration test run always.

Yes, should be fine to keep enabled in CI. The concern was that is this fast enough to keep it enabled for cargo test from command line.

scottgerring · 2024-12-16T12:04:41Z

Hey both, thanks for the quick turnaround! Preemptive sorry-for-the-braindump - i'm rushing this comment out between meetings 😱

`cargo test` and integration tests

I changed this so that it's possible to skip tests in the crate; the round-trip example i've added that shows an issue with the JSON serialization caught me out, as I marked it skipped and had to work out why it was still running in the CI build. I expect this suite will grow and the need to have compiled-but-skip tests will increase.

It's normal behaviour for cargo test to run the integration suite, and per the docs you can skip it from the CLI by adding --lib. I've modified the CI jobs here so that they maintain the same behaviour - e.g. the integration test suite runs the integration suite, and CI does not. I think it is nice to keep them separate as integration suites often end up a little flakier and separating them out in Github makes it easier for devs to reason about.

For reference, the integration suite currently adds 10s on my MBP if I run cargo test from the root without --lib.

Serialization/Deserialization Tests

The earlier code was also testing the conversion of JSON to proto structs, while it seems the current code no longer does this?

I changed to using Serde types because the roundtrip for metrics is broken (raised #2434) and its a can of worms - this test I added as ignore demonstrates it and validates the serialization/deserialization and can be unskipped when it's fixed.

The metric value (at least) disappears on the deserialization side, not the serialization side, which I think is less of a worry for us! I had a bit of a look but because it is all tied up into the protobuff and serde serialization magic I didn't want to pull that into this PR. The test I've added catches this issue with the serialization

Remaining Work

~~I will add a test for shutdown() flush to metrics and I think that is it. I should be able to do this tomorrow!~~ ✅

opentelemetry-otlp/tests/integration_test/Cargo.toml

opentelemetry-otlp/tests/integration_test/README.md

opentelemetry-otlp/tests/integration_test/actual/.gitignore

cijothomas · 2024-12-16T18:37:50Z

Removed the #[ignore] from all the integration tests - if we don't want them to run as part of cargo test, we can pass --lib in our CI scripts

The reason for ignoring integration tests by default is that they can take a long time to run, so cargo test should not execute them unless explicitly specified.

if it is executing in parallel to the main CI, and is taking less time than the longest CI (which I think is the windows ci), then l think we can let integration test run always.

Yes, should be fine to keep enabled in CI. The concern was that is this fast enough to keep it enabled for cargo test from command line.

Got it. Yes I think it is best to keep this ignored, and only triggered from the integration test CI.
Even if it is just few seconds, given it needs the port free I am inclined to keep existing way.

codecov · 2024-12-16T19:01:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.7%. Comparing base (eb8d7c6) to head (dba5ff1).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #2432     +/-   ##
=======================================
- Coverage   79.4%   76.7%   -2.8%     
=======================================
  Files        122     122             
  Lines      21700   21700             
=======================================
- Hits       17247   16657    -590     
- Misses      4453    5043    +590

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

scottgerring · 2024-12-16T19:06:54Z

@cijothomas , I think this is in a pretty good state, except that I need to work out why the integration suite fails on CI and not locally 😱 I'll look again tomorrow.

cijothomas · 2024-12-16T19:08:16Z

@cijothomas , I think this is in a pretty good state, except that I need to work out why the integration suite fails on CI and not locally 😱 I'll look again tomorrow.

I am curious if we can enable tracing::fmt in the integration tests, and view internal logs? Not required in this PR, just sharing something that'd make our lives easier if things go wrong.

scottgerring · 2024-12-16T19:54:03Z

@cijothomas , I think this is in a pretty good state, except that I need to work out why the integration suite fails on CI and not locally 😱 I'll look again tomorrow.

I am curious if we can enable tracing::fmt in the integration tests, and view internal logs? Not required in this PR, just sharing something that'd make our lives easier if things go wrong.

Good idea! I’ll give it a try tomorrow. Certainly doesn’t hurt leaving it on for tests.

scottgerring · 2024-12-17T10:37:58Z

@cijothomas , this should be good to merge. I've rebased and squashed everything together to make a nicer merge history too. The outstanding issue yesterday with the integration tests was the "magic sleep" for the collector container to start; i've added a newer version of testcontainers so we can wait for the HTTP collector port to start answering instead, which should make things more robust.

There are two outstanding issues:

Deserialization of metrics from OTLP output broken #2434
HTTP-client exporters have issues that cause test failures I will raise an issue to resolve this after this PR is merged, and work on it; as it seems like it is likely happening in the exporter itself, and i've fairly significantly refactored the integration test code, i'm keen to get this PR merged and then look at this extra issue. For the moment the new test is simply skipping itself for hyper/reqwest until whatever issue lies in there is fixed.

…rics tests

lalitb

Thanks for the refactor. Nicely done.

lalitb · 2024-12-17T14:48:26Z

opentelemetry-otlp/tests/integration_test/src/metrics_asserter.rs

-    results: Vec<ResourceMetrics>,
-    expected: Vec<ResourceMetrics>,
+    results: Value,
+    expected: Value,


nit - Please add a TODO (with reference to the metrics serde issue), to be fixed eventually.

scottgerring · 2024-12-17T14:54:41Z

hey @lalitb , i've got a test over here and a link to the issue for the roundtripping the models:

opentelemetry-rust/opentelemetry-otlp/tests/integration_test/tests/metrics.rs

Lines 199 to 221 in 21174e8

    
               /// 
        
               /// Validate JSON/Protobuf models roundtrip correctly. 
        
               /// 
        
               /// TODO - this test fails currently. Fields disappear, such as the actual value of a given metric. 
        
               /// This appears to be on the _deserialization_ side. 
        
               /// Issue: https://github.com/open-telemetry/opentelemetry-rust/issues/2434 
        
               /// 
        
               #[tokio::test] 
        
               #[ignore] 
        
               async fn test_roundtrip_example_data() -> Result<()> { 
        
                   let metrics_in = include_str!("../expected/metrics/test_u64_counter_meter.json"); 
        
                   let metrics: MetricsData = serde_json::from_str(metrics_in)?; 
        
                   let metrics_out = serde_json::to_string(&metrics)?; 
        
                   println!("{:}", metrics_out); 
        
                   let metrics_in_json: Value = serde_json::from_str(metrics_in)?; 
        
                   let metrics_out_json: Value = serde_json::from_str(&metrics_out)?; 
        
                   assert_eq!(metrics_in_json, metrics_out_json); 
        
                   Ok(()) 
        
               }

I reckon, if we keep the MetricsAsserter stuff on Serde, and have separate "roundtrip the models" tests, then we won't accidentally miss cases where an integration test regresses and we don't notice it because of model mapping. What do you think?

cijothomas

LGTM. We can address some of the remaining issues in followups as needed.

cijothomas · 2024-12-17T15:08:02Z

opentelemetry-otlp/tests/integration_test/src/test_utils.rs

+fn init_tracing() {
+    INIT_TRACING.call_once(|| {
+        let subscriber = FmtSubscriber::builder()
+            .with_max_level(tracing::Level::DEBUG)


DEBUG is giving a ton of noise from the networking libraries in the CI logs.. probably okay for now, we can revisit if we find it too much noise.

lalitb · 2024-12-17T15:10:55Z

hey @lalitb , i've got a test over here and a link to the issue for the roundtripping the models:

opentelemetry-rust/opentelemetry-otlp/tests/integration_test/tests/metrics.rs

Lines 199 to 221 in 21174e8

///

/// Validate JSON/Protobuf models roundtrip correctly.

///

/// TODO - this test fails currently. Fields disappear, such as the actual value of a given metric.

/// This appears to be on the _deserialization_ side.

/// Issue: https://github.com/open-telemetry/opentelemetry-rust/issues/2434

///

#[tokio::test]

#[ignore]

async fn test_roundtrip_example_data() -> Result<()> {

let metrics_in = include_str!("../expected/metrics/test_u64_counter_meter.json");

let metrics: MetricsData = serde_json::from_str(metrics_in)?;

let metrics_out = serde_json::to_string(&metrics)?;

println!("{:}", metrics_out);

let metrics_in_json: Value = serde_json::from_str(metrics_in)?;

let metrics_out_json: Value = serde_json::from_str(&metrics_out)?;

assert_eq!(metrics_in_json, metrics_out_json);

Ok(())

}

I reckon, if we keep the MetricsAsserter stuff on Serde, and have separate "roundtrip the models" tests, then we won't accidentally miss cases where an integration test regresses and we don't notice it because of model mapping. What do you think?

Yes agree, this ensures serialization and deserialization are validated independently.

scottgerring · 2024-12-17T15:21:39Z

Hurrah! Thanks @lalitb @cijothomas .
Opening another issue for the rest of the HTTP clients now.

scottgerring force-pushed the chore/integration-test-2 branch from f85c127 to 35034f4 Compare December 15, 2024 16:42

scottgerring commented Dec 15, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/expected/metrics/test_up_down_meter.json Show resolved Hide resolved

scottgerring commented Dec 15, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/otel-collector-config-2.yaml Outdated Show resolved Hide resolved

scottgerring commented Dec 15, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/tests/logs.rs Show resolved Hide resolved

scottgerring commented Dec 15, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/tests/metrics.rs Show resolved Hide resolved

cijothomas reviewed Dec 15, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/tests/metrics.rs Show resolved Hide resolved

cijothomas reviewed Dec 15, 2024

View reviewed changes

lalitb reviewed Dec 16, 2024

View reviewed changes

lalitb added the integration tests Run integration tests label Dec 16, 2024

scottgerring force-pushed the chore/integration-test-2 branch from 609a158 to d5dc9ef Compare December 16, 2024 10:38

scottgerring mentioned this pull request Dec 16, 2024

Deserialization of metrics from OTLP output broken #2434

Open

scottgerring force-pushed the chore/integration-test-2 branch 2 times, most recently from 684a70c to 5707bfe Compare December 16, 2024 11:37

scottgerring marked this pull request as ready for review December 16, 2024 18:04

scottgerring requested a review from a team as a code owner December 16, 2024 18:04

cijothomas reviewed Dec 16, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/Cargo.toml Show resolved Hide resolved

cijothomas reviewed Dec 16, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/README.md Outdated Show resolved Hide resolved

cijothomas reviewed Dec 16, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/README.md Show resolved Hide resolved

cijothomas reviewed Dec 16, 2024

View reviewed changes

opentelemetry-otlp/tests/integration_test/actual/.gitignore Outdated Show resolved Hide resolved

scottgerring force-pushed the chore/integration-test-2 branch from c78170f to 4a59154 Compare December 16, 2024 18:36

scottgerring changed the title ~~chore: alternate refactor integration tests~~ chore: refactor integration tests and add metrics coverage Dec 17, 2024

scottgerring force-pushed the chore/integration-test-2 branch 2 times, most recently from f8044a2 to 90c2449 Compare December 17, 2024 10:33

chore: refactored integration tests for future growth, introduced met…

21174e8

…rics tests

scottgerring force-pushed the chore/integration-test-2 branch from 90c2449 to 21174e8 Compare December 17, 2024 10:42

lalitb approved these changes Dec 17, 2024

View reviewed changes

Merge branch 'main' into chore/integration-test-2

dba5ff1

cijothomas approved these changes Dec 17, 2024

View reviewed changes

cijothomas reviewed Dec 17, 2024

View reviewed changes

cijothomas merged commit 9173ddf into open-telemetry:main Dec 17, 2024
20 of 21 checks passed

scottgerring deleted the chore/integration-test-2 branch December 17, 2024 15:21

scottgerring mentioned this pull request Dec 17, 2024

Enable remaining HTTP clients in metrics integration tests #2441

Open

cijothomas mentioned this pull request Dec 18, 2024

Use integration test to cover key OTLP scenarios #2401

Open

cijothomas changed the title ~~chore: refactor integration tests and add metrics coverage~~ refactor integration tests and add metrics coverage Dec 18, 2024

This was referenced Dec 18, 2024

REQUEST: New membership for scottgerring open-telemetry/community#2496

Closed

chore: Test sync exporters #2455

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor integration tests and add metrics coverage #2432

refactor integration tests and add metrics coverage #2432

scottgerring commented Dec 15, 2024 •

edited

Loading

scottgerring commented Dec 15, 2024 •

edited

Loading

cijothomas left a comment

lalitb commented Dec 16, 2024

lalitb commented Dec 16, 2024

lalitb Dec 16, 2024 •

edited

Loading

lalitb Dec 16, 2024

cijothomas Dec 16, 2024

scottgerring Dec 16, 2024 •

edited

Loading

cijothomas Dec 16, 2024

scottgerring Dec 16, 2024

lalitb Dec 16, 2024 •

edited

Loading

lalitb Dec 17, 2024 •

edited

Loading

cijothomas commented Dec 16, 2024

lalitb commented Dec 16, 2024

scottgerring commented Dec 16, 2024 •

edited

Loading

cijothomas commented Dec 16, 2024

codecov bot commented Dec 16, 2024 •

edited

Loading

scottgerring commented Dec 16, 2024

cijothomas commented Dec 16, 2024

scottgerring commented Dec 16, 2024

scottgerring commented Dec 17, 2024 •

edited

Loading

lalitb left a comment

lalitb Dec 17, 2024 •

edited

Loading

scottgerring commented Dec 17, 2024

cijothomas left a comment

cijothomas Dec 17, 2024

lalitb commented Dec 17, 2024

scottgerring commented Dec 17, 2024

refactor integration tests and add metrics coverage #2432

refactor integration tests and add metrics coverage #2432

Conversation

scottgerring commented Dec 15, 2024 • edited Loading

Changes

Merge requirement checklist

scottgerring commented Dec 15, 2024 • edited Loading

cijothomas left a comment

Choose a reason for hiding this comment

lalitb commented Dec 16, 2024

lalitb commented Dec 16, 2024

lalitb Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

lalitb Dec 16, 2024

Choose a reason for hiding this comment

cijothomas Dec 16, 2024

Choose a reason for hiding this comment

scottgerring Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

cijothomas Dec 16, 2024

Choose a reason for hiding this comment

scottgerring Dec 16, 2024

Choose a reason for hiding this comment

lalitb Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

lalitb Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

cijothomas commented Dec 16, 2024

lalitb commented Dec 16, 2024

scottgerring commented Dec 16, 2024 • edited Loading

cargo test and integration tests

Serialization/Deserialization Tests

Remaining Work

cijothomas commented Dec 16, 2024

codecov bot commented Dec 16, 2024 • edited Loading

Codecov Report

scottgerring commented Dec 16, 2024

cijothomas commented Dec 16, 2024

scottgerring commented Dec 16, 2024

scottgerring commented Dec 17, 2024 • edited Loading

lalitb left a comment

Choose a reason for hiding this comment

lalitb Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

scottgerring commented Dec 17, 2024

cijothomas left a comment

Choose a reason for hiding this comment

cijothomas Dec 17, 2024

Choose a reason for hiding this comment

lalitb commented Dec 17, 2024

scottgerring commented Dec 17, 2024

scottgerring commented Dec 15, 2024 •

edited

Loading

scottgerring commented Dec 15, 2024 •

edited

Loading

lalitb Dec 16, 2024 •

edited

Loading

scottgerring Dec 16, 2024 •

edited

Loading

lalitb Dec 16, 2024 •

edited

Loading

lalitb Dec 17, 2024 •

edited

Loading

scottgerring commented Dec 16, 2024 •

edited

Loading

`cargo test` and integration tests

codecov bot commented Dec 16, 2024 •

edited

Loading

scottgerring commented Dec 17, 2024 •

edited

Loading

lalitb Dec 17, 2024 •

edited

Loading