BMDW File Format 2.0
This document describes the BMDW File Format 2.0 to be used for data delivery by all participating registries.
The file and the filename
The file will be ASCII. Each line ends with a [CR]/[LF] ASCII codes (particular attention should be paid to this when the file is migrated from Unix/Linux-like systems to DOS/Windows). Seperation of field values should be done with spaces (ASCII code 32) and not with TAB's (ASCII code 15).
Registries with data on stem cell donors and cord blood units should separate these two data sets and provide two files: one for stem cell donors, and one for cord blood units. Data of stem cell donors and cord blood units should not be combined in one file.
The first part of the filename is "UPD-" (without the quotes) followed by the short registry code assigned to your registry by BMDW. The extension will be the specification of the format at present ".20" (without the quotes).
Here are only a few examples of short codes used for active registries:
Using this naming convention the name of the Austrian cord blood registry is: UPD-ACB.20
Pretty Good Privacy (PGP) is software that encrypts and compresses files (and for example email). BMDW uses PGP software when exchanging files with participating registries. When submitting a file to BMDW, the file will need to be PGP-encrypted. More information on the PGP software and downloads can be found on the International PGP Homepage.
Before using the PGP software, public keys need to be exchanged to make sure files from the sender can be decrypted by the recipient. When submitting an encrypted file to BMDW, please make sure that you add BMDW as a recipient.
The first part of the PGP-filename is identical to the ASCII-filename. The PGP software will either add a second extension ".PGP", or replace the ".20" extension of the data file with the ".PGP" extension. As an example, the file name of an encrypted Austrian cord blood file would then be: "UPD-ACB.PGP" (without the quotes).
The file may be sent with FTP, or as e-mail, as an attached file. The subject line of the e-mail should be identical to the name of the file, and contain only that file name.
If you wish to send the file via FTP, please contact the BMDW Office, so we can set up an FTP account. Anonymous FTP is not allowed.
If you wish to e-mail the file, send it to BMDWDATA (at) Europdonor (dot) NL.
The reason for the rejection of files according to very strict rules is this: In the future BMDW will be updated more frequently: every month, later every two weeks, perhaps even on demand. This is only feasible if the compilation of BMDW is a highly automated process. Manual correction of, for example, a filename, would create unacceptable overhead-costs and unnecessary delays.
File Format 2.0: General Description
The format is flexible in various ways. In the header of the file you should define which data fields you are sending. You can also define the order and the length of the fields: you may order the fields as you like; the length of the fields is flexible. The fields are column based and all entries should be inside the column. The actual data of the cord blood units or stem cell donors is submitted on one or two lines. The second line contains any DNA-strings, such as DRB1*01:01/01:02. This line is optional. The footer of the file contains '###' plus the number of records sent. Please note that the optional second line(s) should not be counted when determining the number of records for the footer!
Here is an example of the format containing data for 5 cord blood units:
This example contains only a small selection of the data fields that can be submitted. Most data fields that are defined are not required. If a field is not required you can leave it out, instead of submitting blanks.
The field values are seperated by spaces (or blanks; ASCII code 32, decimal). The use of TAB characters (ASCII code 15, decimal) is not allowed.
The implementation of a new specification (i.e. the addition or deletion of fields) will require a minimum effort on your side: you can continue using an older specification, or add the new fields where you like.
The flexible nature of the file format will be explained in detail by discussing further examples of format 2.0.
File Format 2.0: Further Examples
Let us begin with example 1, a sample cord blood file with data from 2 cord blood units (at the end of this document), and have a look at the header (the first line of the file):
In the example the field identifiers are separated by spaces. The length of the fields is determined by you (with exception of the last field, because the width cannot be determined by the next column, which has a width shown below). All fields should be seperated by at least one space.
Note: if you do not accumulate records, the number of data records will be identical to the number of cord blood units or stem cell donors; if you do accumulate, the number of data records will be less than the number of cord blood units or stem cell donors.
In example 2 many fields have been left out. Here you see the usage of the field 'NR', meaning 'number of cord blood units or stem cell donors for this specific phenotype'. This field is an alternative to 'ID'. You may also have both fields in the file, but obviously, if the ID is filled, NR should always be 1.
File Format 2.0: Field Definitions
First line (required; and in recommended order):
Second line (optional):
The second line may be used for DNA "strings". These strings are an alternative to the allele codes as managed by the National Marrow Donor Program (NMDP), and allow you to provide combinations of DNA alleles (ambiguities). The strings should be used only if you do not use NMDP allele codes (the use of NMDP allele codes is however preferred over the use of strings), or for ambiguities for which no allele code is available yet. In other words, do not submit both DRB1*01:01/01:04 and DRB11 01:AD. However, you can use a multiple allele code for one DRB1 allele, and a string for the other allele.
The combinations should be in the format:
name of locus - asterisk - 1st possibility - slash - 2nd possibility etc.
Example of a DNA string combination for DRB1 alleles:
represents the DNA typing DRB1*03:01 or 03:02, and DRB1*13:02 or 13:03 or 13:04.
Incorrect files or incorrect records
Below are reasons for the rejection of a whole file or a record:
Reasons for the rejection of the whole file:
Reasons for the rejection of a record:
Examples of File Format 2.0
Example 1: A sample cord blood file for 2 cord blood units
A1 A2 B1 B2 D1 D2 ID RB11 RB12 QB11 QB12 NVC TNC1 2 35 51 13 8 CBB3-00197 08 13 02:AB 03:XX 60 121DRB1*13:01/13:03/13:06/13:10 DRB1*08:01/08:02/08:03/08:051 2 51 3 17 CBB7-00201 65 110###2
This example describes two cord blood units. The first unit is typed for HLA-A, B, DR, DRB1, and DQB1; the ID is CBB3-00197. At the end of the record, the volume of the unit and the number of nucleated cells are sent. The second line for this unit shows the multiple allele code strings for the DRB1 results.
The second unit is only typed for HLA-A, B and DR, it is homozygous for the HLA-B locus. The ID is provided, as well as the date serology typing and again the nucleated cell count and the volume of the cord.
Example 2: A sample donor file with 14 donors
A1 A2 B1 B2 D1 D2 NR1 2 35 51 13 17 13 11 45 76 15 17 11 2 51 3 8 81 2 51 3 8 1DRB1*03:01/03:03/03:06/03:10 DRB1*08:01/08:023 11 51 8 13 224 2403 65 71 5 15 1###6
This example contains 14 donors. No donor identifications are sent, all records have a donor count (field NR), indicating the number of donors for that specific phenotype. Looking at the 3rd and 4th phenotype in this example, this indicates that the phenotype is present for 9 donors: for 8 of these donors no DNA typing is available, for one of these donors there is (the DNA strings on line 6 in this file).
Example 3: A sample donor file with DNA in new nomenclature
A1 A2 B1 B2 D1 D2 NR RB11 RB121 2 35 51 13 17 1 13:02 03:013 11 45 76 15 17 1 15:AAS 03:01:011 2 51 3 8 81 2 51 3 8 1DRB1*03:01/03:03/03:06/03:10 DRB1*08:01/08:023 11 51 8 13 2 13:02 13:BNJZ24 2403 65 71 5 15 1###6
This is a copy of example 2, but with DRB1 columns added, and the DRB1 has been presented in the new nomenclature.
Additional fields for NIMA data
To be able to submit maternal HLA information to BMDW, the following optional fields may be added to the file submitted to BMDW. We have provided two options to submit maternal HLA information: option A allows to submit the complete phenotype of the mother; option B to submit the NIMA’s themselves.
Please note that when submitting maternal HLA information to BMDW either option A or B should be used – these two should not be combined. If fields for option A and option B are submitted, only the NIMA fields of option B will be used and the fields for option A will be ignored.
For the maternal HLA information a minimum of HLA-A, and -B is required.
Option A – fields description for submitting the maternal HLA phenotypes
Option B – fields description for submitting the NIMA’s
- 2.0.000: Original version of the new file format
- 2.0.001: Added extra fields for additional donor and CBU details (2003-04-12)
- 2.0.002: Correction to one of the examples (2004-06-08)
- 2.0.003: Correction to one of the examples (2007-09-24)
- 2.0.004: Correction to one of the examples (2007-10-23)
- 2.0.005: Volume and nucleated cell counts defined as required fields (2007-12-20)
- 2.0.006: ID for cords defined as a required field (2008-06-03)
- 2.0.007: DNA fields have been changed to anticipate the nomenclature changes scheduled for April 2010 (2010-01-28)
- 2.0.008: Added an example with a number of HLA values represented in the April 2010 nomenclature (2010-01-28)
- 2.0.009: Minor corrections (2010-03-08)
- 2.0.010: The example have been updated to the new WHO HLA nomenclature (2010-04-21)
- 2.0.011: Update the reference to the NetCord-FACT standards, and verified the definition for the nucleated cell count (2010-06-28)
- 2.0.012: Upper limit for the range for TNC has been increased to 999 (2010-09-09)
- 2.0.012: Fixed the text, concerning the conversion from DNA to serologic equivalents (2010-11-22)
- 2.0.013: Added a sentence to clarify that registries with both donors and cords should provide two files, and not combine cords and donors in one file (2011-03-01)
- 2.0.014: Upper limit for NVC has been increased to 400ml (2011-05-16)
- 2.0.015: Explicitly stated that the NVC value is to be listed without decimals (2012-03-20)
- 2.0.016: Added the field descriptions for NIMA data (2012-09-25)
BMDW - Bone Marrow Donors Worldwide * Plesmanlaan 1b * 2333 BZ Leiden * The Netherlands