Discussion:
Importing Excel files with columns > 255 chars
(too old to reply)
S.Kumar
2007-09-06 20:24:00 UTC
Permalink
I created a SSIS package that imports an Excel file that has columns with >
255 chars into a SQL table.
I am getting error "Text was truncated or one or more characters had no
match in the target code page"
For Excel Source Output I tried changing the type for External and output
columns to Unicode text (DT_NTEXT) but I got an error when I changed Excel
Source Error Output (property value is not valid) to the same.

I hope I made some sense.

Basically, has anyone imported Excel with large columns and if so what
special things did they have to do?
EMartinez
2007-09-08 03:58:04 UTC
Permalink
Post by S.Kumar
I created a SSIS package that imports an Excel file that has columns with >
255 chars into a SQL table.
I am getting error "Text was truncated or one or more characters had no
match in the target code page"
For Excel Source Output I tried changing the type for External and output
columns to Unicode text (DT_NTEXT) but I got an error when I changed Excel
Source Error Output (property value is not valid) to the same.
I hope I made some sense.
Basically, has anyone imported Excel with large columns and if so what
special things did they have to do?
There are a few things you can try, as far as I know. You can either
set the Truncation Errors on the Excel Source to 'Ignore Errors.' You
can change the output column to DT_String with a suitable length set.
Or, you could use a Derived Column transformation and use a Substring
or similar string manipulation to get the required maximum length.
Hope this helps.

Regards,

Enrique Martinez
Sr. Software Consultant
j***@googlemail.com
2007-09-12 12:01:49 UTC
Permalink
Post by EMartinez
Post by S.Kumar
I created a SSIS package that imports an Excel file that has columns with >
255 chars into a SQL table.
I am getting error "Text was truncated or one or more characters had no
match in the target code page"
For Excel Source Output I tried changing the type for External and output
columns to Unicode text (DT_NTEXT) but I got an error when I changed Excel
Source Error Output (property value is not valid) to the same.
I hope I made some sense.
Basically, has anyone imported Excel with large columns and if so what
special things did they have to do?
There are a few things you can try, as far as I know. You can either
set the Truncation Errors on the Excel Source to 'Ignore Errors.' You
can change the output column to DT_String with a suitable length set.
Or, you could use a Derived Column transformation and use a Substring
or similar string manipulation to get the required maximum length.
Hope this helps.
Regards,
Enrique Martinez
Sr. Software Consultant
Hi S.Kumar,

I'm assuming you don't want to truncate your Source data to 255
characters, and it'd be nice to think that this is a simple task ...
simply set the width property of the wide columns (using the "Show
Advanced Editor" option from the right-click menu on the task) to a
higher value. However - for some reason, Microsoft does not allow you
to change the column width of Error output columns, AND it requires
that the regular Output columns match the Error output columns in
width. So basically you can increase the column width no problem, but
the task will fail and automatically set the width back to the default
(255) :( I haven't found a workaround / fix for this yet.

In this case, you need to examine the way that Excel connections
determine the type and width of columns in the first place -
basically, Excel scans the first 8 rows of the sheet (yes, 8) and uses
these rows to determine the type and width of the column.

Try this: Create a spreadsheet and type in > 255 characters in Column
A Cell 1. Create a connection to that spreadsheet and check the data
type in a connection created to the sheet - you will see that it is
set to Unicode Text Stream (or non-unicode depending on your
settings). Now add 8 rows above the one you created, and set a couple
of the values in Column A to strings < 255 characters. Re-create the
connection to the file and check the column property - it should be
set to Unicode String (width 255). This can lead to REAL headaches -
imagine a column that could accept numbers or letters (e.g. postcode),
but it just so happens that the first 8 columns only have numbers ...
fun :)

Luckily, there is a solution! All you have to do is edit the
registry! Here's an snip from the blog I found a few months ago
(http://blog.lab49.com/?p=196) that put me on the right track:
<snip>
Even more entertaining, there is absolutely no way to make this 100%
reliable - although with some pain, you can improve the situation.
Here's what you need to do. First add the "IMEX=1" option to your
connection string like this:

OleDbConnection dbConnection = new OleDbConnection
(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:BAR.XLS;Extended
Properties=""Excel 8.0;HDR=Yes;IMEX=1;""");That tells ADO.NET to honor
the following registry key when reading the spreadsheet:

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/
ImportMixedTypesThis registry key tells ADO.NET what to do when it
encounters mixed types in the first 8 rows. It can either be set to
the string "Majority Type" (for the default behavior) or to
"Text" (which forces the column to be of type string). Note that you
are still screwed if the first 8 postal codes are numeric and the 9th
is Canadian. (Not to mention that the "Text" option invokes handling
that fails on strings over 255 characters, but let's skip that for
now.)

There's also a second relevant registry setting (which is honored
regardless of the IMEX option):

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/
TypeGuessRowsThat says how many rows to scan to guess the datatype.
The default is 8, but you can set it anywhere from 0-16 decimal (0
meaning "scan the first 16384 rows", and all other values meaning what
they say). Putting this all together, the most reliable way to read a
US/Canadian postal code is to use the following registry settings:

TypeGuessRows = 0
ImportMixedTypes = TextThat's pretty close to perfect, although it
will still fail if the first 16384 postal codes are numeric and any of
the subsequent ones aren't.
</snip>

If you don't like that option (or have no control over the registry
settings of the server to which you will be deploying), then there is
another option ... that is to have a dummy row in the spreadsheet
within the first 8 columns that forces the column types to be parsed
correctly. I know that this is less-than ideal, but in the past I
have usually found this is the most practical way to deal with this
issue.

Hope this helped!
Good Luck
J
S.Kumar
2007-09-13 14:48:02 UTC
Permalink
Thanks for the detailed reply. I will give it a shot.
Post by j***@googlemail.com
Post by EMartinez
Post by S.Kumar
I created a SSIS package that imports an Excel file that has columns with >
255 chars into a SQL table.
I am getting error "Text was truncated or one or more characters had no
match in the target code page"
For Excel Source Output I tried changing the type for External and output
columns to Unicode text (DT_NTEXT) but I got an error when I changed Excel
Source Error Output (property value is not valid) to the same.
I hope I made some sense.
Basically, has anyone imported Excel with large columns and if so what
special things did they have to do?
There are a few things you can try, as far as I know. You can either
set the Truncation Errors on the Excel Source to 'Ignore Errors.' You
can change the output column to DT_String with a suitable length set.
Or, you could use a Derived Column transformation and use a Substring
or similar string manipulation to get the required maximum length.
Hope this helps.
Regards,
Enrique Martinez
Sr. Software Consultant
Hi S.Kumar,
I'm assuming you don't want to truncate your Source data to 255
characters, and it'd be nice to think that this is a simple task ...
simply set the width property of the wide columns (using the "Show
Advanced Editor" option from the right-click menu on the task) to a
higher value. However - for some reason, Microsoft does not allow you
to change the column width of Error output columns, AND it requires
that the regular Output columns match the Error output columns in
width. So basically you can increase the column width no problem, but
the task will fail and automatically set the width back to the default
(255) :( I haven't found a workaround / fix for this yet.
In this case, you need to examine the way that Excel connections
determine the type and width of columns in the first place -
basically, Excel scans the first 8 rows of the sheet (yes, 8) and uses
these rows to determine the type and width of the column.
Try this: Create a spreadsheet and type in > 255 characters in Column
A Cell 1. Create a connection to that spreadsheet and check the data
type in a connection created to the sheet - you will see that it is
set to Unicode Text Stream (or non-unicode depending on your
settings). Now add 8 rows above the one you created, and set a couple
of the values in Column A to strings < 255 characters. Re-create the
connection to the file and check the column property - it should be
set to Unicode String (width 255). This can lead to REAL headaches -
imagine a column that could accept numbers or letters (e.g. postcode),
but it just so happens that the first 8 columns only have numbers ...
fun :)
Luckily, there is a solution! All you have to do is edit the
registry! Here's an snip from the blog I found a few months ago
<snip>
Even more entertaining, there is absolutely no way to make this 100%
reliable - although with some pain, you can improve the situation.
Here's what you need to do. First add the "IMEX=1" option to your
OleDbConnection dbConnection = new OleDbConnection
Properties=""Excel 8.0;HDR=Yes;IMEX=1;""");That tells ADO.NET to honor
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/
ImportMixedTypesThis registry key tells ADO.NET what to do when it
encounters mixed types in the first 8 rows. It can either be set to
the string "Majority Type" (for the default behavior) or to
"Text" (which forces the column to be of type string). Note that you
are still screwed if the first 8 postal codes are numeric and the 9th
is Canadian. (Not to mention that the "Text" option invokes handling
that fails on strings over 255 characters, but let's skip that for
now.)
There's also a second relevant registry setting (which is honored
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/
TypeGuessRowsThat says how many rows to scan to guess the datatype.
The default is 8, but you can set it anywhere from 0-16 decimal (0
meaning "scan the first 16384 rows", and all other values meaning what
they say). Putting this all together, the most reliable way to read a
TypeGuessRows = 0
ImportMixedTypes = TextThat's pretty close to perfect, although it
will still fail if the first 16384 postal codes are numeric and any of
the subsequent ones aren't.
</snip>
If you don't like that option (or have no control over the registry
settings of the server to which you will be deploying), then there is
another option ... that is to have a dummy row in the spreadsheet
within the first 8 columns that forces the column types to be parsed
correctly. I know that this is less-than ideal, but in the past I
have usually found this is the most practical way to deal with this
issue.
Hope this helped!
Good Luck
J
tridy
2008-05-26 12:35:00 UTC
Permalink
I have been working on 255 characters truncation problem and have found the
solution. I used ODBC instead of the OLEDB and it doesn't truncate the values
to 255 symbols anymore:

OdbcConnection con = new OdbcConnection(@"Driver={Microsoft Excel Driver
(*.xls)};DBQ=c:\temp\testbook.xls");
OdbcCommand cmd = new OdbcCommand(@"SELECT * FROM [Workbook1$]");
cmd.Connection = con;
OdbcDataAdapter da = new OdbcDataAdapter(cmd);
DataSet ds = new DataSet();
da.Fill(ds);

you can use 2007 format driver to access XLSX files:
... Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)} ...
tridy
2008-05-26 12:45:01 UTC
Permalink
sorry, wrong forum I posted my message to.
j***@googlemail.com
2008-05-27 13:00:57 UTC
Permalink
Post by tridy
sorry, wrong forum I posted my message to.
Useful nonetheless - thanks!
J
j***@gmail.com
2016-06-14 18:12:24 UTC
Permalink
Post by tridy
I have been working on 255 characters truncation problem and have found the
solution. I used ODBC instead of the OLEDB and it doesn't truncate the values
(*.xls)};DBQ=c:\temp\testbook.xls");
cmd.Connection = con;
OdbcDataAdapter da = new OdbcDataAdapter(cmd);
DataSet ds = new DataSet();
da.Fill(ds);
... Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)} ...
This helps. Thanks a lot.

Loading...