BMDW File Format 2.0

This document describes the BMDW File Format 2.0 to be used for data delivery by all participating registries.

The file and the filename

The file will be ASCII. Each line ends with a [CR]/[LF] ASCII codes (particular attention should be paid to this when the file is migrated from Unix/Linux-like systems to DOS/Windows). Seperation of field values should be done with spaces (ASCII code 32) and not with TAB's (ASCII code 15).

The first part of the filename is "UPD-" (without the quotes) followed by the short registry code assigned to your registry by BMDW. The extension will be the specification of the format at present ".20" (without the quotes).

Here are only a few examples of short codes used for active registries:

ACB Austria CORD
B Belgium
USA1 USA NMDP
AM Armenia

Using this naming convention the name of the Austrian cord blood registry is: UPD-ACB.20

Pretty Good Privacy (PGP) is software that encrypts and compresses files (and for example email). BMDW uses PGP software when exchanging files with participating registries. When submitting a file to BMDW, the file will need to be PGP-encrypted. More information on the PGP software and downloads can be found on the International PGP Homepage.

Before using the PGP software, public keys need to be exchanged to make sure files from the sender can be decrypted by the recipient. When submitting an encrypted file to BMDW, please make sure that you add BMDW as a recipient.

The first part of the PGP-filename is identical to the ASCII-filename. The PGP software will either add a second extension ".PGP", or replace the ".20" extension of the data file with the ".PGP" extension. As an example, the file name of an encrypted Austrian cord blood file would then be: "UPD-ACB.PGP" (without the quotes).

The file may be sent with FTP, or as e-mail, as an attached file. The subject line of the e-mail should be identical to the name of the file, and contain only that file name.

If you wish to send the file via FTP, please contact the BMDW Office, so we can set up an FTP account. Anonymous FTP is not allowed.

If you wish to e-mail the file, send it to BMDWDATA (at) Europdonor (dot) NL.

The reason for the rejection of files according to very strict rules is this: In the future BMDW will be updated more frequently: every month, later every two weeks, perhaps even on demand. This is only feasible if the compilation of BMDW is a highly automated process. Manual correction of, for example, a filename, would create unacceptable overhead-costs and unnecessary delays.

File Format 2.0: General Description

The format is flexible in various ways. In the header of the file you should define which data fields you are sending. You can also define the order and the length of the fields: you may order the fields as you like; the length of the fields is flexible, although there is a minimal length defined for all fields. The actual data of the cord blood units or stem cell donors is submitted on one or two lines. The second line contains any DNA-strings, such as DRB1*01:01/01:02. This line is optional. The footer of the file contains '###' plus the number of records sent. Please note that the optional second line(s) should not be counted when determining the number of records for the footer!

Here is an example of the format containing data for 5 cord blood units:

A1 A2 B1 B2 D1 D2 ID NVC TNC
1 2 35 51 13 17 CBID00001 60 121
3 11 45 76 15 17 CBID00002 65 110
1 2 51 3 8 CBID00003 70 115
DRB1*03:01/03:03/03:06/03:10 DRB1*08:01/08:02
3 11 51 8 13 CBID00004 75 113
24 2403 65 71 1 18 CBID00005 60 120
DRB1*01:01/01:04 DRB1*03:02/03:03
###5

This example contains only a small selection of the data fields that can be submitted. Most data fields that are defined are not required. If a field is not required you can leave it out, instead of submitting blanks.

The field values are seperated by spaces (or blanks; ASCII code 32, decimal). The use of TAB characters (ASCII code 15, decimal) is not allowed.

The implementation of a new specification (i.e. the addition or deletion of fields) will require a minimum effort on your side: you can continue using an older specification, or add the new fields where you like.

The flexible nature of the file format will be explained in detail by discussing further examples of format 2.0.

File Format 2.0: Further Examples

Let us begin with example 1, a sample cord blood file with data from 2 cord blood units (at the end of this document), and have a look at the header (the first line of the file):

  • The header contains the field identifiers
  • The first character of the first field identifier is in column 1
  • The field identifiers are left aligned
  • Only the field identifiers as defined in the next chapter are allowed

In the example the field identifiers are separated by spaces. The length of the fields is determined by you, although a minimum field length is defined in the field definitions in the next section. All fields should be seperated by at least one space.

There is no obligatory field order. For example, if you prefer to have the ID's at the start or end of the line, or somewhere else in the middle, this is up to you. Just make sure that the header of the file (first line in the file) is modified accordingly.

The data for each unit is on one or two lines. The first line is required. The second line, containing only DNA strings, may be absent for one unit and present for another. All fields in the first line are left aligned.

The last record of the file begins with ###, followed by the number of data records in the file. Lines with DNA strings should not be counted, since they are additional information belonging another line in the file, which itself is counted.

Note: if you do not accumulate records, the number of data records will be identical to the number of cord blood units or stem cell donors; if you do accumulate, the number of data records will be less than the number of cord blood units or stem cell donors.

In example 1 the DQA1 fields (QA11 and QA12) are empty. Of course they might be filled for units further down in the file, if the file were to contain more than the given 2 units. However, if a field is empty throughout the file, you may leave it out. There is one exception to this rule: for each locus two or no fields must be present; in other words, you cannot have DQ1 in the header and leave out DQ2.

In example 2 many fields have been left out. Here you see the usage of the field 'NR', meaning 'number of cord blood units or stem cell donors for this specific phenotype'. This field is an alternative to 'ID'. You may also have both fields in the file, but obviously, if the ID is filled, NR should always be 1.

Finally, some remarks about the HLA. Fields A1, A2, B1, B2, D1 and D2 are used by the match programs, that work with serologic values (sometimes also called 'search determinants'). Therefore, if a value has been determined by DNA typing methods, it must be converted to a serologic equivalent according to the WHO Nomenclature. You should do this yourself (e.g. by entering DR1: 1, if DRB11 is 01:AD). DNA-data for other loci than A, B, and DR should not be converted to serologic equivalents.

'01:XX' is equivalent to '01'. Both codes '01:XX' and '01' are allowed.

File Format 2.0: Field Definitions

First line (required; and in recommended order):

Field Identifier Description Length Comment
A1HLA-A, 1st antigen4Serology, or search determinant based on DNA typing methods
A2HLA-A, 2nd antigen4Serology, or search determinant based on DNA typing methods
B1HLA-B, 1st antigen4Serology, or search determinant based on DNA typing methods
B2HLA-B, 2nd antigen4Serology, or search determinant based on DNA typing methods
D1HLA-DR, 1st antigen4Serology, or search determinant based on DNA typing methods
D2HLA-DR, 2nd antigen4Serology, or search determinant based on DNA typing methods
 
51HLA-DR51/52/53, 1st antigen2Serology
52HLA-DR51/52/53, 2nd antigen2Serology
C1HLA-C, 1st antigen2Serology
C2HLA-C, 2nd antigen2Serology
Q1HLA-DQ, 1st antigen2Serology
Q2HLA-DQ, 2nd antigen2Serology
P1HLA-DP, 1st antigen2Serology
P2HLA-DP, 2nd antigen2Serology
 
NRCount of donors/CBU's5Indicates the number of units for this specific phenotype - ID or NR is required
IDIdentification15Identification of donor or CBU - either ID or NR is required for donors; ID is required for CBU's (as of 1-Jan-2009)
 
DA1DNA-A, 1st allele20Determined by DNA typing methods
DA2DNA-A, 2nd allele20Determined by DNA typing methods
DB1DNA-B, 1st allele20Determined by DNA typing methods
DB2DNA-B, 2nd allele20Determined by DNA typing methods
DC1DNA-C, 1st allele20Determined by DNA typing methods
DC2DNA-C, 2nd allele20Determined by DNA typing methods
 
RB11DNA-DRB1, 1st allele20Determined by DNA typing methods
RB12DNA-DRB1, 2nd allele20Determined by DNA typing methods
RB31DNA-DRB3, 1st allele20Determined by DNA typing methods
RB32DNA-DRB3, 2nd allele20Determined by DNA typing methods
RB41DNA-DRB4, 1st allele20Determined by DNA typing methods
RB42DNA-DRB4, 2nd allele20Determined by DNA typing methods
RB51DNA-DRB5, 1st allele20Determined by DNA typing methods
RB52DNA-DRB5, 2nd allele20Determined by DNA typing methods
QB11DNA-DQB1, 1st allele20Determined by DNA typing methods
QB12DNA-DQB1, 2nd allele20Determined by DNA typing methods
QA11DNA-DQA1, 1st allele20Determined by DNA typing methods
QA12DNA-DQA1, 2nd allele20Determined by DNA typing methods
PB11DNA-DPB1, 1st allele20Determined by DNA typing methods
PB12DNA-DPB1, 2nd allele20Determined by DNA typing methods
PA11DNA-DPA1, 1st allele20Determined by DNA typing methods
PA12DNA-DPA1, 2nd allele20Determined by DNA typing methods
 
DATESERDate of serology typing8Date format YYYYMMDD; most recent date
DATEDNADate of DNA typing8Date format YYYYMMDD; most recent date
 
DOBDate of birth of donor/CBU8Date format YYYYMMDD
GNDGender of donor/CBU1M (=Male) or F (=Female)
ABOBlood type of donor/CBU3Either A,B, O, or AB; all suffixed with either P (=positive) or N (=negative)
CMVCMV status of donor/CBU1Possible values:
N = Both IgG and IgM negative
Q = Questionable / Unclear
G = IgG positive, IgM negative
M = IgG negative, IgM positive
B = Both IgG and IgM positive
P = IgG or IgM positive, test did not differentiate
CMVDATEDate of CMV test of donor/CBU8Date format YYYYMMDD
 
NVCNet Volume Collected of CBU3Volume of the unit in milliliters; required field - see also note below.
TNCTotal Nucleated Cells count of CBU4The rounded number of nucleated cells in the units of 107; required field (as of 1-Jan-2009) - see note below.
CD34PCollected number of CD34+ cells of CBU5Cell count after volume reduction; numeric value with decimal point in units of 106.
MONONUCCollected number of mononuclear cells of CBU3The rounded number of mononuclear cells in the units of 107.

Notes:

  • A complete typing of at least two loci is needed for matching. Currently A and B are required for either serology or DNA.
  • Although the columns A2, B2 are required (next to A1 and B1) in the header of the file, it is allowed to leave the A2 or B2 fields blank to indicate a homozygous phenotype.
  • ID and NR are alternatives. If both are submitted NR must be 1.
  • Data for cord blood units or stem cell donors should be accumulated only if all HLA and DNA field values are identical.
  • The definitions used by BMDW for the volume and the number of nucleated cells as found in the document 'International Standards for Cord Blood Collection, Banking, and Release for Administration', from NetCord-FACT, fourth edition, January 2010
Net Volume: The net volume is the volume of the cord blood unit at the end of the collection
Nucleated Cells: The total nucleated cell count is the number of nucleated cells after processing, prior to cryopreservation

Second line (optional):

The second line may be used for DNA "strings". These strings are an alternative to the allele codes as managed by the National Marrow Donor Program (NMDP), and allow you to provide combinations of DNA alleles (ambiguities). The strings should be used only if you do not use NMDP allele codes (the use of NMDP allele codes is however preferred over the use of strings), or for ambiguities for which no allele code is available yet. In other words, do not submit both DRB1*01:01/01:04 and DRB11 01:AD. However, you can use a multiple allele code for one DRB1 allele, and a string for the other allele.

The combinations should be in the format:

name of locus - asterisk - 1st possibility - slash - 2nd possibility etc.

Example of a DNA string combination for DRB1 alleles:

DRB1*03:01/03:02 DRB1*13:02/13:03/13:04

represents the DNA typing DRB1*03:01 or 03:02, and DRB1*13:02 or 13:03 or 13:04.

Incorrect files or incorrect records

Below are reasons for the rejection of a whole file or a record:

Reasons for the rejection of the whole file:

  • An error in the header with field definitions
  • The footer (last record in the file) is absent, or the footer contains an incorrect number of records, or with incorrect format

Reasons for the rejection of a record:

  • HLA A or B locus is missing
  • Incorrect nomenclature (e.g. DR BR); the nomenclature should be in accordance with the latest update in Tissue Antigens, and with NMDP's multiple allele code lists (both are redistributed by BMDW)
  • Incorrect relation between HLA-DR and DRB1
  • Incorrect relation between HLA-DR51/52/53 and HLA-DR
  • NR and ID blank or missing
  • ID filled, NR larger than 1
  • Incorrect date format
  • One of the typing dates < 1-Jan-1990, or > current date
  • Net Volume Collected (NVC) is less than 10, or more than 300
  • Number of Total Nucleated Cells (TNC) is less than 10, or more than 999
  • Invalid format of DNA-strings
  • Donor age outside range of 18-60 years

Examples of File Format 2.0

Example 1: A sample cord blood file for 2 cord blood units

A1   A2   B1   B2   D1   D2   ID         RB11  RB12 QB11  QB12  NVC TNC
1    2    35   51   13   8    CBB3-00197 08    13   02:AB 03:XX 60  121
DRB1*13:01/13:03/13:06/13:10 DRB1*08:01/08:02/08:03/08:05
1    2    51        3    17   CBB7-00201                        65  110
###2

This example describes two cord blood units. The first unit is typed for HLA-A, B, DR, DRB1, and DQB1; the ID is CBB3-00197. At the end of the record, the volume of the unit and the number of nucleated cells are sent. The second line for this unit shows the multiple allele code strings for the DRB1 results.

The second unit is only typed for HLA-A, B and DR, it is homozygous for the HLA-B locus. The ID is provided, as well as the date serology typing and again the nucleated cell count and the volume of the cord.

Example 2: A sample donor file with 14 donors

A1   A2   B1   B2   D1   D2   NR
1    2    35   51   13   17   1
3    11   45   76   15   17   1
1    2    51        3    8    8
1    2    51        3    8    1
DRB1*03:01/03:03/03:06/03:10 DRB1*08:01/08:02
3    11   51   8    13        2
24   2403 65   71   5    15   1
###6

This example contains 14 donors. No donor identifications are sent, all records have a donor count (field NR), indicating the number of donors for that specific phenotype. Looking at the 3rd and 4th phenotype in this example, this indicates that the phenotype is present for 9 donors: for 8 of these donors no DNA typing is available, for one of these donors there is (the DNA strings on line 6 in this file).

Example 3: A sample donor file with DNA in new nomenclature

A1   A2   B1   B2   D1   D2   NR    RB11      RB12
1    2    35   51   13   17   1     13:02     03:01
3    11   45   76   15   17   1     15:AAS    03:01:01
1    2    51        3    8    8
1    2    51        3    8    1
DRB1*03:01/03:03/03:06/03:10 DRB1*08:01/08:02
3    11   51   8    13        2     13:02     13:BNJZ
24   2403 65   71   5    15   1
###6

This is a copy of example 2, but with DRB1 columns added, and the DRB1 has been presented in the new nomenclature.


Revision History

- 2.0.000: Original version of the new file format

- 2.0.001: Added extra fields for additional donor and CBU details (2003-04-12)

- 2.0.002: Correction to one of the examples (2004-06-08)

- 2.0.003: Correction to one of the examples (2007-09-24)

- 2.0.004: Correction to one of the examples (2007-10-23)

- 2.0.005: Volume and nucleated cell counts defined as required fields (2007-12-20)

- 2.0.006: ID for cords defined as a required field (2008-06-03)

- 2.0.007: DNA fields have been changed to anticipate the nomenclature changes scheduled for April 2010 (2010-01-28)

- 2.0.008: Added an example with a number of HLA values represented in the April 2010 nomenclature (2010-01-28)

- 2.0.009: Minor corrections (2010-03-08)

- 2.0.010: The example have been updated to the new WHO HLA nomenclature (2010-04-21)

- 2.0.011: Update the reference to the NetCord-FACT standards, and verified the definition for the nucleated cell count (2010-06-28)

- 2.0.012: Upper limit for the range for TNC has been increased to 999 (2010-09-09)

 

BMDW - Bone Marrow Donors Worldwide * Plesmanlaan 1b * 2333 BZ Leiden * The Netherlands