I was hoping that turning this into the non-greedy # Person \d+. This greedy version ends up taking the entire string, since it is the pattern it finds. It is pretty simple to start off knowing that we want to find a pattern something like # Person \d+.*, but unfortunately that doesn’t work because it doesn’t know how much to slurp. Sometimes this is a bit tricky since you need to make sure that the pattern holds regardless of the content in each data set. It turns out that the re.findall method is just the thing for this case, so long that you know how to describe the regex pattern in a robust way. Once you realize that this is more about identifying patterns rather than using a delimiter, one may shift their focus to the Regular Expressions module re instead. I put the term “delimiter” in quotes because the reason string.split doesn’t apply to this problem is because there actually is no delimiter rather, there are consistent patterns repeated multiple times, each starting with some kind of header. Notice that if you have multiple delimiters (as you would in the string 1+-2), they will be separate tokens. Using the basic string.split method doesn’t work because it gets rid of the delimiter, where in this case we want to keep the “delimiter”. You may need to adapt this for your specific scenario but this general approach should be able to work if you have the same need.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |