String Manipulation Challenge!

Isaac

Lifelong Learner
Local time
Today, 03:28
Joined
Mar 14, 2017
Messages
9,810
This may not be a big challenge for some of you. I've looked at charindex, substring, but cannot quite get it without looping (which I don't want to do).

I have text in a column like this:

asdf asdf # asdf asdf asdf

or

asdf asdf # to # asdf asdf asdf

I want to STRIP AWAY (remove) everything after the final number - whether that be the 1st (and only) number, or the 2nd (and final) number. There will only be 2 numbers max.

So I want those above examples to be converted to, respectively:
asdf asdf #
asdf asdf # to #

So find the second number and only return the text through that point. Or said another way, from the right side, find the first number you come to as you go left and strip that away

Looking for a way to do this in T-SQL

I think I might be able to get this if I tried hard enough, but my brain is extremely tired and so far I have NOT had any luck. Please assist if you can, thank you!
 
Last edited:
If you you mean actual numbers then 1 or 2 would work, if you do mean #'s then 3 or 4.

---EDIT: UNSOLVED.... I need to fix #1

Code:
DECLARE @stringvalue varchar(max) = 'asdf asdf 2 # to 74 # asdf asdf asdf'


SELECT CASE WHEN PATINDEX('%[0-9]%', @stringvalue) = 0 THEN NULL ELSE LEFT(@stringvalue,PATINDEX('%[0-9]%'      , @stringvalue)) END AS '1'
SELECT CASE WHEN PATINDEX('%[0-9]%', @stringvalue) = 0 THEN NULL ELSE SUBSTRING(@stringvalue, 1, LEN(@stringvalue) - PATINDEX('%[0-9]%'      , REVERSE(@stringvalue))) END AS '2'
SELECT LEFT(@stringvalue, CHARINDEX('#', @stringvalue)) as '3'
SELECT LEFT(@stringvalue, CHARINDEX('#',@stringvalue,CHARINDEX('#',@stringvalue)+1)) as '4'
 
I did mean numbers like 2 or 3, sorry for the ambiguity.
 
Patindex! I must get chummy with this function. Checking ..
 
Here is an updated answer that should work for you. And no ambiguity on your part - it is my Friday afternoon brain fog! Reverse was a new one for me. I have never had to use it in T-SQL string manipulation before.


Code:
[CODE]DECLARE @stringvalue varchar(max) = 'asdf asdf 2222 # to 744 # asdf asdf asdf'

SELECT CASE WHEN PATINDEX('%[0-9]%', @stringvalue) = 0 THEN NULL ELSE LEFT(@stringvalue,patindex('%[0-9][^0-9]%', @stringvalue)) END AS '1'

SELECT CASE WHEN PATINDEX('%[0-9]%', @stringvalue) = 0 THEN NULL ELSE SUBSTRING(@stringvalue, 1, LEN(@stringvalue) - PATINDEX('%[0-9]%'      , REVERSE(@stringvalue))+1) END AS '2'


Simplified:
Code:
DECLARE @stringvalue varchar(max) = 'asdf asdf 2222 to 744 asdf asdf asdf'

SELECT LEFT(@stringvalue,patindex('%[0-9][^0-9]%', @stringvalue))

SELECT SUBSTRING(@stringvalue, 1, LEN(@stringvalue) - PATINDEX('%[0-9]%'      , REVERSE(@stringvalue))+1)
[/CODE]
 
Last edited:
I am going to try it out first thing tomorrow. Thank you very much for your time in reviewing my problem and coming up with a solution. I have seen people use Pat index a lot at a particular previous job but never had a use for it so never became familiar with it myself. Thank you again. I will report back more with my findings tomorrow
 
I don't know how well PATINDEX() would perform compared to this SQL that uses the set processing strengths of the engine.
Performance can be quite counter intuitive and isn't always the simplest code that wins.
I would be very interested in the comparative speeds.

This query does a Cartesian join with a tally table, finding the position of the first of each digit in the reversed string, then selects the one closest to the end, before returning the characters to its left. Returning as many as ten rows and locating the digits for each data row might work against its performance but Cartesian joins are very fast.

I don't have SQL server at home so this is air code and may have mistakes but I'm sure you will get the idea.


Firstly create a tally temp table with the digits 0 to 9 as values, then run this.

Code:
SELECT Id, LEFT(datatable.datacolumn, LEN(datatable.datacolumn) - A.LastDigitPos) AS Stripped
FROM (
SELECT datatable.Id, MIN(CHARINDEX(#tally.digitcolumn, REVERSE(datatable.datacolumn))) AS LastDigitPos
FROM datatable
JOIN #tally
GROUP BY datatable.Id
) AS A
INNER JOIN datatable
ON datatable.Id = A.Id
 
Here is an updated answer that should work for you. And no ambiguity on your part - it is my Friday afternoon brain fog! Reverse was a new one for me. I have never had to use it in T-SQL string manipulation before.


Code:
[CODE]DECLARE @stringvalue varchar(max) = 'asdf asdf 2222 # to 744 # asdf asdf asdf'

SELECT CASE WHEN PATINDEX('%[0-9]%', @stringvalue) = 0 THEN NULL ELSE LEFT(@stringvalue,patindex('%[0-9][^0-9]%', @stringvalue)) END AS '1'

SELECT CASE WHEN PATINDEX('%[0-9]%', @stringvalue) = 0 THEN NULL ELSE SUBSTRING(@stringvalue, 1, LEN(@stringvalue) - PATINDEX('%[0-9]%'      , REVERSE(@stringvalue))+1) END AS '2'


Simplified:
Code:
DECLARE @stringvalue varchar(max) = 'asdf asdf 2222 to 744 asdf asdf asdf'

SELECT LEFT(@stringvalue,patindex('%[0-9][^0-9]%', @stringvalue))

SELECT SUBSTRING(@stringvalue, 1, LEN(@stringvalue) - PATINDEX('%[0-9]%'      , REVERSE(@stringvalue))+1)
[/CODE]

Focusing on the one that most easily makes sense to me at the moment, the one with reverse.

It seems to work most of the time, but here is a string it does not work on. I cannot quite figure out why but I suspect it has something to do with Reverse and spaces.

Code:
Declare @str varchar(max)
set @str = 'RASS goal -4   -Sedation '
select
substring(@str,1,len(@str) - patindex('%[0-9]%', reverse(@str)) + 1 )

I note that the Datalength() of @Str is longer than the Len() of @Str. I thought about replacing the spaces with nothing but this would mess up my desired position. I tried replacing them with a pipe symbol and then using Reverse on that Replace expression, but got the same result.
 
@Galaxiom I appreciate your review & assistance. In this case I am probably going with the simpler-looking solution because this effort is only going to be exercised on the final "result" of a reporting request - i.e., a few hundred records at most. (I already have it stripped down to the first 25 characters of a column and will perform this stripping away on the final resultset for the report).

However, your caution as to performance is well noted and I will bookmark this tab for future reference should a similar need arise with a larger dataset. Thanks
 
Hey Isaac, could you specify the below a little more? Like how are these columns populated, are you populating the columns, are you importing the data into columns, etc…. I guess i ask also because sometimes it seems you have one number or two… i’ll see what i can come up with..


asdf asdf # asdf asdf asdf

or

asdf asdf # to # asdf asdf asdf
 
And i ask because it seemed the most difficult part was determining the last number in its entirety versus being the only number… i don’t have my pc to test anything right now, but as much as this has itched my noggin, you have me just as curious as to the details i was asking about in the post above this one..
 
Hey Isaac, could you specify the below a little more? Like how are these columns populated, are you populating the columns, are you importing the data into columns, etc…. I guess i ask also because sometimes it seems you have one number or two… i’ll see what i can come up with..


asdf asdf # asdf asdf asdf

or

asdf asdf # to # asdf asdf asdf

it's from a huge varchar(max) document column where somewhere in the massive amount of text, there is text like

rass goal 2 to 3

but it can vary a lot, like rass goal 2:3 or rass goal 2-3 or rass goal -2 / 4 and probably 10 other permutations I've seen.
I've already grabbed a 25 character string starting with text like either RASS GOAL or RASS TARGET or somethign else I forget.

Now I want to get the numbers-and-to-the-left-of-them chunk.

But that one example I provided really flummoxes me. Warning to any one offended by cross posting (not you just in general anyone), I might post this question on sqlservercentral.com too but probably not yet, I feel we are very close
 
I found out today that some of those spaces after the last number are actually not spaces, they're line breaks / char 13 or 10
 
Hmmm... I did see something along the lines of checking once string character is number, keep reading next string value if numeric, else if last value was numeric, insert pipe | operator, then after reading entire string, replace all non [0-9] characters with " ", then replace " " with "", or parse the array as you would like... IDK... It is the weekend, and this tripped me up later than i was expecting to end last week on! LOL.

but it can vary a lot, like rass goal 2:3 or rass goal 2-3 or rass goal -2 / 4 and probably 10 other permutations
Out of all variations.... is anything consistent - like in your example.. at least there is always a space in between alpha/numeric(other than - for negative....).... is that always the case? A small sample dataset would probably speed this up as well...
 

Users who are viewing this thread

Back
Top Bottom