Skip to content
This repository was archived by the owner on Jan 18, 2026. It is now read-only.

Comments

fix: fix Avro Object Container File (OCF) encoding header using defau…#590

Closed
TuSKan wants to merge 2 commits intohamba:mainfrom
TuSKan:fix_custom_tag
Closed

fix: fix Avro Object Container File (OCF) encoding header using defau…#590
TuSKan wants to merge 2 commits intohamba:mainfrom
TuSKan:fix_custom_tag

Conversation

@TuSKan
Copy link
Contributor

@TuSKan TuSKan commented Dec 18, 2025

OCF Encoder fails when using custom TagKey configuration

Description

When attempting to use ocf.NewEncoder with a custom EncodingConfig that specifies a TagKey (e.g., using "json" tags instead of "avro"), the encoder fails during initialization.

This occurs because the ocf package mistakenly applies the user's custom configuration (which looks for the custom tag key) to the internal serialization of the OCF Header. The Header struct definition relies on standard avro tags, so when the encoder looks for the custom tag (e.g., json:"magic"), it fails to find it and reports a missing field error.

Reproduction Steps

Here is a minimal reproduction case demonstrating the issue:

package avro_test

import (
	"bytes"
	"fmt"
	"testing"

	"github.com/hamba/avro/v2"
	"github.com/hamba/avro/v2/ocf"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

type TestObject struct {
	StringField string `json:"string_field"`
	IntField    int    `json:"int_field"`
}

func TestCustomTagKeyOCF(t *testing.T) {
	// Define schema matching the json tags
	schemaStr := `{
		"type": "record",
		"name": "TestObject",
		"fields": [
			{"name": "string_field", "type": "string"},
			{"name": "int_field", "type": "int"}
		]
	}`

	// Create a Config with TagKey set to "json"
	config := avro.Config{
		TagKey: "json",
	}.Freeze()

	// Create a buffer to write the OCF file to
	var buf bytes.Buffer

	// Create OCF encoder with custom encoding config
	enc, err := ocf.NewEncoder(schemaStr, &buf, ocf.WithEncodingConfig(config))
	require.NoError(t, err)

	// Data to encode
	data := TestObject{
		StringField: "hello",
		IntField:    42,
	}

	// Encode using the OCF encoder
	err = enc.Encode(data)
	require.NoError(t, err)

	// Close the encoder to flush data
	err = enc.Close()
	require.NoError(t, err)

	// Verify the output
	dec, err := ocf.NewDecoder(&buf, ocf.WithDecoderConfig(config))
	require.NoError(t, err)

	var result TestObject
	require.True(t, dec.HasNext())
	err = dec.Decode(&result)
	require.NoError(t, err)

	assert.Equal(t, data.StringField, result.StringField)
	assert.Equal(t, data.IntField, result.IntField)

	fmt.Printf("Successfully encoded and decoded OCF with json tag: %+v\n", result)
}

Error Output

Running the code above results in the following error:

avro: record org.apache.avro.file.Header is missing required field "magic"

Root Cause

The issue stems from mixing the configuration for the Container (framing/header) and the Payload (user data).

  • Container Level: The OCF Spec defines the Header layout, and the Header struct in Go uses avro tags. This must always be encoded using the standard Avro configuration (DefaultConfig).
  • Payload Level: The user's data (e.g., TestObject) may use custom tags (e.g., json) as configured by the user.

Currently, ocf.NewEncoder applies cfg.EncodingConfig to both.

Proposed Fix

The internal avro.Writer used for the OCF framing should always use avro.DefaultConfig to ensure the Header is written correctly, while the payload encoder should continue to use the user-provided config.

Diff of proposed changes in ocf/ocf.go:

@@ -429,7 +429,7 @@
 				return nil, err
 			}
 
-			writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig))
+			writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig))
 			buf := &bytes.Buffer{}
 			e := &Encoder{
 				writer:      writer,
@@ -470,7 +470,7 @@
 		return nil, err
 	}
 
-	writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig))
+	writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig))
 	writer.WriteVal(HeaderSchema, header)
 	if err = writer.Flush(); err != nil {
 		return nil, err

}

writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig))
writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not fix anything. It is simple enough for the DefaultConfig to be overridden to change the tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly a bug, easily verified with the provided code above., it's using override config
ocf.NewEncoder(schemaStr, &buf, ocf.WithEncodingConfig(config))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not arguing that it is not a bug. I am saying that you are making an assumption that DefaultConfig is immutable, and that is not true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's should be immutable
Seems not right:
avro. DefaultConfig = cfg. EncodingCoonfig

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) It is a var
b) There is not reason the user should not be allowed to decide what they want the DefaultConfig to be.

@TuSKan TuSKan mentioned this pull request Dec 30, 2025
@nrwiersma nrwiersma closed this Jan 18, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants