Parsing PGN files

davesmith202

Employee of Access World
Local time
Today, 16:59
Joined
Jul 20, 2001
Messages
522
I want to create a routine that looks at a text file and parses the contents into fields in a database. It is based on the PGN chess file format.

There would be an Event field, Date field etc. and of course the Moves field.

Sample data:

[Event "St George's CC"]
[Site "St George's CC"]
[Date "1856.??.??"]
[Round "?"]
[White "Wywill, Marmaduke"]
[Black "Lowenthal"]
[Result "0-1"]
[ECO "A03"]
[PlyCount "76"]
[EventDate "1856.??.??"]

1. f4 d5 2. Nf3 Nc6 3. e3 Bg4 4. Bb5 e6 5. O-O Bd6 6. b3 Nge7 7. Bb2 O-O 8. c4
dxc4 9. bxc4 e5 10. c5 Bxc5 11. fxe5 Bxf3 12. Rxf3 Nd4 13. Rh3 Nxb5 14. Qc2
Bxe3+ 15. dxe3 h6 16. Nd2 c6 17. Ne4 Ng6 18. e6 Qe8 19. Rxh6 Qxe6 20. Nf6+ gxf6
21. Rxg6+ fxg6 22. Qxg6+ Kh8 23. Qh6+ Kg8 24. Rf1 Rae8 25. Qg6+ Kh8 26. Rxf6
Qxe3+ 27. Rf2+ Nd4 28. Qh5+ Kg7 29. Qg4+ Kh6 30. Qh4+ Kg6 31. Qg4+ Qg5 32.
Qxg5+ Kxg5 33. Rxf8 Rxf8 34. Bxd4 b6 35. Be3+ Kf5 36. Kf2 Ke4+ 37. Ke2 Rh8 38.
Bg1 Rh4 0-1

[Event "Vienna"]
[Site "Vienna"]
[Date "1873.??.??"]
[Round "1"]
[White "Bird, Henry"]
[Black "Heral, Joseph"]
[Result "0-1"]
[ECO "A03"]
[PlyCount "106"]
[EventDate "1873.??.??"]

1. f4 d5 2. Nf3 Nc6 3. e3 Nf6 4. b3 Bg4 5. Be2 Bxf3 6. Bxf3 e6 7. c3 Bd6 8. d4
Ne7 9. Na3 a6 10. O-O c6 11. Nc2 Qc7 12. c4 Qb8 13. Bb2 h6 14. Qd3 g5 15. g3 g4
16. Bg2 h5 17. e4 Bc7 18. e5 Nd7 19. Ne3 Qa7 20. cxd5 cxd5 21. Kh1 O-O-O 22. f5
Bb6 23. a3 Nxe5 24. dxe5 Bxe3 25. fxe6 h4 26. gxh4 fxe6 27. Rae1 Bb6 28. Rf6
Rxh4 29. Bc1 Rdh8 30. Bf4 Bf2 31. Rf1 g3 32. Bxg3 Bxg3 33. Rf8+ Kd7 34. Rxh8
Rxh8 35. Qxg3 Qd4 36. Qg5 Qh4 37. Qxh4 Rxh4 38. Bf3 Nc6 39. Re1 Nd4 40. Bd1 Kc6
41. Kg2 Kc5 42. Re3 Nf5 43. Re1 Kd4 44. Bf3 Kc3 45. h3 b5 46. Bg4 d4 47. Bxf5

It could also look like this:

[Event "A30-c4,c5,Nf3"]
[Site "chessopeningsdatabase.com"]
[Date "1996"]
[Round "1"]
[White "kortschnoi v"]
[Black "brunner lucas"]
[Result "1-0"]
[ECO "A30"]
[PlyCount "75"]

1. c4 c5 2. Nf3 Nf6 3. Nc3 e6 4. g3 d5 5. cxd5 Nxd5 6. Bg2 Nc6 7. O-O Be7 8. d4 O-O 9. e4 Nb6 10. dxc5 Qxd1 11. Rxd1 Bxc5 12. Bf4 f6 13. Rac1 e5 14. Nb5 exf4 15. Rxc5 fxg3 16. hxg3 Bg4 17. Rd2 Rad8 18. Nd6 Bxf3 19. Bxf3 Rf7 20. Bg4 Re7 21. f4 Kf8 22. a3 g6 23. Bh3 Rc7 24. Rcc2 a6 25. Kg2 Ke7 26. e5 fxe5 27. fxe5 Na8 28. Rf2 Nxe5 29. Rxc7+ Nxc7 30. Nc8+ Rxc8 31. Bxc8 b6 32. Rd2 a5 33. b3 h5 34. Kf2 g5 35. Ke3 a4 36. Ke4 Nf7 37. Rd7+ Ke8 38. Rxc7 1-0







[Event "A30-c4,c5,Nf3"]
[Site "chessopeningsdatabase.com"]
[Date "1995"]
[Round "1"]
[White "koolsbergen nico"]
[Black "powles jonathan"]
[Result "0-1"]
[ECO "A30"]
[PlyCount "114"]

1. c4 c5 2. Nf3 Nf6 3. g3 b6 4. Bg2 Bb7 5. O-O e6 6. Nc3 Be7 7. b3 O-O 8. d4 cxd4 9. Qxd4 Nc6 10. Qd1 d5 11. cxd5 exd5 12. Bf4 Qd7 13. a3 d4 14. Nb5 Rfd8 15. Qd3 a6 16. Nc7 Ra7 17. Ng5 Bd6 18. Nd5 Bxf4 19. Nxf4 Ne5 20. Qb1 Bxg2 21. Kxg2 Rc7 22. Nh5 Qd5+ 23. Kg1 Neg4 24. Nxf6+ Nxf6 25. Nf3 Ne4 26. Qd3 Rc3 27. Qxa6 d3 28. Rad1 d2 29. b4 Qc6 30. e3 Rc1 31. Kg2 f6 32. b5 Qc7 33. Qa4 Qc2 34. Qxc2 Rxc2 35. Nd4 Ra2 36. Ne2 Rxa3 37. Kf3 Nc5 38. Rb1 Ra2 39. Rfd1 Na4 40. Rh1 Rb2 41. Ra1 Rxb5 42. Nd4 Nc3 43. Nxb5 Nxb5 44. Rhd1 Nc3 45. Kf4 b5 46. e4 Rd3 47. h4 b4 48. e5 fxe5+ 49. Kxe5 b3 50. Ra8+ Kf7 51. Ra7+ Kg6 52. h5+ Kh6 53. Ra6+ Kxh5 54. Rh1+ Kg4 55. Rb6 d1=Q 56. Rxd1 Nxd1 57. Rb7 g5 0-1

Any suggestions on how to do this? Can't get my head around it!

Sometimes it will have all the headers, sometimes not. It will always have all the moves. But sometimes the moves are as a continuous line, other times they are like in the first example where there are line breaks.

Anyone want to take a stab at this one?

Thanks,

Dave
 
Well, if you don't know about the VBA.Strings.Split() function then check it out. It'll return a variant array of strings as delimited by a character you provide.
If you want an array, for instance, where each element is a line of text from your original string you can use code like ...
Code:
[COLOR="Green"]'create the array[/COLOR]
dim var
var = split(yourstring, vbcrlf)

[COLOR="Green"]'show the contents of the array[/COLOR]
dim i as integer
for i = 0 to ubound(var)
  debug.print var(i)
next
Then you can split the contents of array elements using Split() again with a different delimiter. A divide and conquer approach.
Cheers,
 
And good thing you've got "tags" so you could read of the field names from the Split item string that lagbolt mentioned and save into that field.
 
I'm liking the Split() function. I can see how that would help. So I could run through the array using Instr to find an instance of Event etc.

e.g.

If $mystring contains "Event" then put in Event field.
If $mystring contains "Date" then put in Date field.

That kind of thing.

I am not sure how I pick up the moves line. Perhaps I could do a search for the first two characters and if they match "1." then its the Moves field?
 
I'm sure lagbolt has other ideas but here's mine.

For the moves, you keep a counter of which array you've last saved, use InStr() to check whether it contains a dot (.), if it does move to the next item and concatenate the next couple of items untile you get another item with a dot.

When you're checking for field names, use the first item in the array, myFieldsArray(0). Whatever is in there would be your field name.

Edit: Reconsidered the pair suggestion
 
All the moves go in the same field. But the format of some PGN files have the moves line-breaked rather than one continuous line. That is a tricky bit!
 
Did you see my edit on the last post? I'm not sure if you did or is your response based on that?

If it is then, read the next line, check the length of the string Len(), if it's 0 then it means there aren't any more moves.
 
Things are going in a little slowly today since I am shattered...

I am a little unsure of what you mean by the pairs. Do you mean try to look for 1. f4 d5, then check for a dot in the next few characters, if there then you will add 2. Nf3 Nc6, then check, then add 3. e3 Bg4. Is that what you mean?
 
No ignore the pairs idea. I changed that before your post # 6. Reread my post #5. This is about checking for a dot.
 
So check the current array row for a dot. If it has one then check the next array row for a dot. If there is a dot there, then add them together. Then check the next row. That kind of thing?
 
Consider this code ...
Code:
dim vGame
dim vTurn
dim vMove

vGame = split(<moves line here>, ". ")
for each vTurn in vGame
  vMove = split(vTurn, " ")
  debug.print "Turn: " & vMove(2) - 1,
  debug.print "White: " & vMove(0),
  debug.print "Black: " & vMove(1)
next
This would be the core of the parsing engine I'd write for the moves.
 
So check the current array row for a dot. If it has one then check the next array row for a dot. If there is a dot there, then add them together. Then check the next row. That kind of thing?
Something like that. It looks like lagbolt has some code for you.
 
I love the sound of a "parsing engine"! Thanks for your help guys. Will take it for a spin and see what I end up with.
 
This is my routine so far but I don't seem to be triggering the parsing engine section!

Code:
Private Sub cmdParse_Click()

Dim strPGNdata As String

Dim vGame
Dim vTurn
Dim vMove


strPGNdata = PGNdata

Dim var
var = Split(strPGNdata, vbCrLf)

'show the contents of the array
Dim i As Integer
For i = 0 To UBound(var)
  Debug.Print var(i)
  'If InStr(1, var, "Event") Then MsgBox "Event"
  
  
If (InStr(var(i), ".") = True) And (InStr(var(i), "[") = False) Then
vGame = Split(var(i), ". ")
For Each vTurn In vGame
  vMove = Split(vTurn, " ")
  'Debug.Print "Turn: " & vMove(2) - 1,
  'Debug.Print "White: " & vMove(0),
  'Debug.Print "Black: " & vMove(1)
Next
End If
  

Next

End Sub
 
You should be reading the file line by line then perform the strip/check on each line instead of saving the whole text file into a variable and reading off that.
 
Is the Move Parsing Engine going to split out each move? I only need all the moves into the same field. i.e. I would want all this...

1. c4 c5 2. Nf3 Nf6 3. g3 b6 4. Bg2 Bb7 5. O-O e6 6. Nc3 Be7 7. b3 O-O 8. d4 cxd4 9. Qxd4 Nc6 10. Qd1 d5 11. cxd5 exd5 12. Bf4 Qd7 13. a3 d4 14. Nb5 Rfd8 15. Qd3 a6 16. Nc7 Ra7 17. Ng5 Bd6 18. Nd5 Bxf4 19. Nxf4 Ne5 20. Qb1 Bxg2 21. Kxg2 Rc7 22. Nh5 Qd5+ 23. Kg1 Neg4 24. Nxf6+ Nxf6 25. Nf3 Ne4 26. Qd3 Rc3 27. Qxa6 d3 28. Rad1 d2 29. b4 Qc6 30. e3 Rc1 31. Kg2 f6 32. b5 Qc7 33. Qa4 Qc2 34. Qxc2 Rxc2 35. Nd4 Ra2 36. Ne2 Rxa3 37. Kf3 Nc5 38. Rb1 Ra2 39. Rfd1 Na4 40. Rh1 Rb2 41. Ra1 Rxb5 42. Nd4 Nc3 43. Nxb5 Nxb5 44. Rhd1 Nc3 45. Kf4 b5 46. e4 Rd3 47. h4 b4 48. e5 fxe5+ 49. Kxe5 b3 50. Ra8+ Kf7 51. Ra7+ Kg6 52. h5+ Kh6 53. Ra6+ Kxh5 54. Rh1+ Kg4 55. Rb6 d1=Q 56. Rxd1 Nxd1 57. Rb7 g5 0-1

...going into the string $mymoves to then populate a memo field. I don't want to have c4 in a field in record 1, then c5 in a field in record 2.

Does the Move Parsing Engine try to split out each individual move rather than have all the moves stored in a string?
 
Yeap, that's what it's doing. But you would still need to read the file line by line and perform the checks.

Since you want it in one line, what you do is read the first line and save into myMoves$, move to the next, save into another variable, check the length and if it's 0 then there are no more moves. If it's not zero, concatenate the other variable into myMoves$.
 
please ignore everything below, unless you are interested in how chess notation works

i reread the thread, and the op just wants a single move sequence in a memo field


======================================

how are you trying to STORE the moves
can you explain your structure here?

are you storing white/black separately, or the whole move sequence as a single string?

if you are trying to separate moves, i can see that being the fiddliest bit.

---------------
for the benefit of other views the moves show

1. white move black move
2. white move black move

note that 0-0 is a move (castling K-side)
and 0-0-0 is also a move (castling Q-side)
d8=Q indicates a promotion
+ symbol indicates check

because of these possibilities, it could be that there are additional spurious spaces in the move string (not sure about that) which might make using split to separate the moves inaccurate (not sure about that though)

the end of the game will be denoted by the result
1-0, 0-1 1/2-1/2


------------------
perhaps the op could also confirm

could there be other symbols for moves - eg ep for en passant captures, or symbols indicating strong/weak moves?

could there be alternative continuations indicated in brackets?

9. d5 (9. e5 would indicate an alternative move sequence)
in some case there could be further nested alternatives to any depth


if so, how does your move store take these into account.
 
Last edited:
how are you trying to STORE the moves
can you explain your structure here?

are you storing white/black separately, or the whole move sequence as a single string?

i can see that being the fiddliest bit.
Apparently the OP just wants to store the whole moves as a string, no separation needed. Look at his last post.
 
gemma-the-husky, the whole move sequence is to go in a single string. It doesn't need splitting up. I agree that is the fiddliest bit!
 

Users who are viewing this thread

Back
Top Bottom