{"id":825,"date":"2024-12-16T18:26:11","date_gmt":"2024-12-16T23:26:11","guid":{"rendered":"https:\/\/brian.digitalmaddox.com\/blog\/?p=825"},"modified":"2024-12-16T18:26:11","modified_gmt":"2024-12-16T23:26:11","slug":"some-background-on-natural-language-processing-part-2-older-methods-and-techniques","status":"publish","type":"post","link":"https:\/\/brian.digitalmaddox.com\/blog\/?p=825","title":{"rendered":"Some Background on Natural Language Processing &#8211; Part 2: Older Methods and Techniques"},"content":{"rendered":"<p>Last time we learned why English is a hard language, both for humans and especially for computers.<span class=\"Apple-converted-space\">\u00a0 <\/span>For this time I think we will look at the past of NLP to understand the present. It actually has been an interesting history to get to where we are now.<\/p>\n<p>Historic NLP methods relied on approaches that focused more on linguistics and statistical analysis.<span class=\"Apple-converted-space\">\u00a0 <\/span>Computers were not as powerful as they are now, and definitely did not have GPUs with the crazy amount of processing capability that they have now.<span class=\"Apple-converted-space\">\u00a0 <\/span>They can be broadly grouped into six categories: rule-based systems, bag of words, term frequency-inverse document frequency, n-grams, Hidden Markov Models, and Support Vector Machines.<span class=\"Apple-converted-space\">\u00a0 <\/span>Let us explore these methods to see how they worked and led to today\u2019s more powerful systems. \u00a0This post will have a lot of links to other sources of information as it is intended to be more of a gentle introduction than an exhaustive analysis.<\/p>\n<h2>Rule-Based Systems<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Rule-based_system\">Rule-Based Systems<\/a> are perhaps the oldest method of trying to get a computer to be able to process a language.<span class=\"Apple-converted-space\">\u00a0 <\/span>These systems were built on predefined linguistic rules and handcrafted grammars.<span class=\"Apple-converted-space\">\u00a0 <\/span><a href=\"https:\/\/www.scaler.com\/topics\/nlp\/introduction-to-grammar-in-nlp\/\">Grammar<\/a> in this case is not about usage, such as matching nouns and verbs and what not.<span class=\"Apple-converted-space\">\u00a0 <\/span>In this case the grammar is known as a formal grammar that is a mathematical structure of a sentence.<span class=\"Apple-converted-space\">\u00a0 <\/span>One specific method is a <a href=\"https:\/\/medium.com\/@avinashmachinelearninginfo\/cfg-in-nlp-fcf812baa939\">context-free grammar<\/a> that has a set of rules of how to form sentences from smaller words and phrases.<\/p>\n<p>Rule-based systems explicitly encode language in a series of rules so that text can be parsed.<span class=\"Apple-converted-space\">\u00a0 <\/span>This parsing would then be able to identify parts of speech and identify sentence structure.<span class=\"Apple-converted-space\">\u00a0 <\/span>These rules were typically created by linguists and software developers working together to try to codify language as a series of mathematical rules.<\/p>\n<p>The rules they would create were simple so that they could be combined together for more complex sentences. <span class=\"Apple-converted-space\">\u00a0 <\/span>For example, one rule could be \u201cany word that ends in <strong>ing<\/strong> is likely a verb\u201d, while another could be \u201ca noun follows determiner words such as <strong>a<\/strong>, <strong>then<\/strong>, and <strong>an<\/strong>.\u201d<span class=\"Apple-converted-space\">\u00a0 <\/span>The developer would then take these rules to create a parser to take in sentences and break them down into their syntactic components using the mathematically-defined grammars.<\/p>\n<p>Since I am a cat person, let us look at this very simple example sentence: \u201cMy cat is playing with a toy.\u201d<span class=\"Apple-converted-space\">\u00a0 <\/span>In this case, <strong>the<\/strong> would be tagged as a determiner words, so <strong>cat<\/strong> would be tagged as a noun.<span class=\"Apple-converted-space\">\u00a0 <\/span><strong>Is playing<\/strong> would be identified as a verb.<span class=\"Apple-converted-space\">\u00a0 <\/span><strong>A<\/strong> is another determiner word, so toy would be tagged as another noun.<\/p>\n<p>While this sounds simple, these systems were complex and labor intensive.<span class=\"Apple-converted-space\">\u00a0 <\/span>They required the work of expert linguists and developers to build the parsing systems.<span class=\"Apple-converted-space\">\u00a0 <\/span>They required a lot of maintenance as problems were found and the rules needed to be modified.<span class=\"Apple-converted-space\">\u00a0 <\/span>As they were based on formal mathematics, they were not very flexible when it comes to something sloppy like English.<span class=\"Apple-converted-space\">\u00a0 <\/span>This meant that rules after rules would have to be developed and stacked on top of one another.<\/p>\n<h2>Bag of Words<\/h2>\n<p>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bag-of-words_model\">Bag of Words<\/a> (BoW) method was an early attempt at applying statistical processing to NLP.<span class=\"Apple-converted-space\">\u00a0 <\/span>It sought to represent text in a numerical form so that statistical algorithms could be applied.<span class=\"Apple-converted-space\">\u00a0 <\/span>These algorithms could then do things like classify documents or determine the similarity of two different documents.<\/p>\n<p>The BoW method works by treating the input text as an unordered collection of words.<span class=\"Apple-converted-space\">\u00a0 <\/span>In this case, however, neither the order of words nor the sentence grammar are used.<span class=\"Apple-converted-space\">\u00a0 <\/span>The document is <a href=\"https:\/\/nlp.stanford.edu\/IR-book\/html\/htmledition\/tokenization-1.html\">tokenized<\/a>, where the words are converted into tokens.<span class=\"Apple-converted-space\">\u00a0 <\/span>A list of each unique token is then created.<span class=\"Apple-converted-space\">\u00a0 <\/span>The input text is analyzed and each word is counted to determine the frequency of occurrence in the document.<span class=\"Apple-converted-space\">\u00a0 <\/span>The document is then converted into a vector of word frequencies that is the same size as the number of unique words in the document.<\/p>\n<p>You may know vectors from geometry where they represent lines and their direction.<span class=\"Apple-converted-space\">\u00a0 <\/span>It is still similar in this case where it is a <a href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-vectors-machine-learning\/\">mathematical array of numbers<\/a>.<span class=\"Apple-converted-space\">\u00a0 <\/span>In the case of a frequency representation, the array could look something like [10, 5, 3, 9] and so on, where each number represents the frequency a word appears.<\/p>\n<p>For computing document similarity, both documents would be turned into frequency vectors by enumerating their unique words and then creating a vector that is the same length as the total number of unique words.<span class=\"Apple-converted-space\">\u00a0 <\/span>The frequencies of each word are inserted into the array, with a zero placed in locations where a word occurs in one document but not in the other.<span class=\"Apple-converted-space\">\u00a0 <\/span>The order of the word frequencies of the array can be sorted so each array entry corresponds to the same one in the other document.<\/p>\n<p>The comparison is easier to think of back in terms of geometry.<span class=\"Apple-converted-space\">\u00a0 <\/span>There are two methods to compute the distance between the vectors: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cosine_similarity\">cosine similarity<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Euclidean_distance\">Euclidean Distance<\/a>.<span class=\"Apple-converted-space\">\u00a0 <\/span>For cosine similarity, the cosine of the angle between the two vectors in <a href=\"https:\/\/aegis4048.github.io\/understanding_multi-dimensionality_in_vector_space_modeling\">multidimensional space<\/a> is calculated with the result being between -1 and 1.<span class=\"Apple-converted-space\">\u00a0 <\/span>A value of 1 means the vectors are identical, 0 means they are orthogonal to each other (not similar), and a -1 means they are very different from each other, and other values are between one of the three.<\/p>\n<p>Euclidean Distance calculates the straight-line distance between each point in the vectors in Euclidean space (told you geometry would come into it).<span class=\"Apple-converted-space\">\u00a0 <\/span>The distance metric is actually based on the Pythagorean Theorem that you may remember from high school.<span class=\"Apple-converted-space\">\u00a0 <\/span>The output of this is a vector that is the same length as the document vectors and each entry is the distance between the corresponding points.<span class=\"Apple-converted-space\">\u00a0 <\/span>The smaller the distances, the more similar the documents are.<\/p>\n<p>While BoW is good for things like document similarity, it has several drawbacks when compared to other NLP techniques.<span class=\"Apple-converted-space\">\u00a0 <\/span>For one, since it just deals with statistical frequencies, it ignores things such as order of words and their context.<span class=\"Apple-converted-space\">\u00a0 <\/span>This means there is no real semantic information that is carried over.<span class=\"Apple-converted-space\">\u00a0 <\/span>The other drawback is that for large documents that are very different, the generated vectors can be largely empty and become cumbersome for distance computation algorithms.<\/p>\n<h2>Term Frequency-Inverse Document Frequency<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Tf%E2%80%93idf\">Term Frequency-Inverse Document Frequency<\/a> (TF-IDF) came about as an improvement on the BoW concept by weighing words based on importance in a document relative to a larger group of documents.<span class=\"Apple-converted-space\">\u00a0 <\/span>It attempts to identify significant words in a document while ignoring common words that frequently occur, such as the, and, but, and so on.<\/p>\n<p>To understand how it works, let us break down the various parts of how it works.<span class=\"Apple-converted-space\">\u00a0 <\/span>The first part is the <a href=\"https:\/\/www.opinosis-analytics.com\/knowledge-base\/term-frequency-explained\/\">Term Frequency<\/a> (TF).<span class=\"Apple-converted-space\">\u00a0 <\/span>As it sounds, this is a measure of how frequently a word appears in a document.<span class=\"Apple-converted-space\">\u00a0 <\/span>Thus words that occur more will have a higher score. <span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/Tf-image.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-829\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/Tf-image-300x26.png\" alt=\"\" width=\"300\" height=\"26\" srcset=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/Tf-image-300x26.png 300w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/Tf-image.png 482w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>The next part of the calculation is the <a href=\"https:\/\/kavita-ganesan.com\/what-is-inverse-document-frequency\/\">Inverse Document Frequency<\/a> (IDF).<span class=\"Apple-converted-space\">\u00a0 <\/span>This equation is a measure of how often a word occurs across the entire set of documents.<span class=\"Apple-converted-space\">\u00a0 <\/span>Words that do not appear often are assigned a higher score.<\/p>\n<p><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/IMG_0013.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-830\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/IMG_0013-300x29.png\" alt=\"\" width=\"300\" height=\"29\" srcset=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/IMG_0013-300x29.png 300w, https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/IMG_0013.png 420w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Now we multiply the two values together to come up with the TF-IDF value.<\/p>\n<p><a href=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/IMG_0014.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-831\" src=\"https:\/\/brian.digitalmaddox.com\/blog\/wp-content\/uploads\/2024\/12\/IMG_0014.png\" alt=\"\" width=\"191\" height=\"20\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>This value reflects how \u201cimportant\u201d a word is in a document relative to the total number of documents.<span class=\"Apple-converted-space\">\u00a0 <\/span>Word that occur often in one document but not in the total number of documents would end up with a high TF-IDF score.<span class=\"Apple-converted-space\">\u00a0 <\/span>This can be use to filter out the common words I mentioned above.<\/p>\n<p>TF-IDF also has several drawbacks when it comes to NLP.<span class=\"Apple-converted-space\">\u00a0 <\/span>For one, it still ignores word order and the context in which words are used.<span class=\"Apple-converted-space\">\u00a0 <\/span>As previously mentioned, word order and context are very important when attempting to determine the actual meaning of a word and how it is used.<\/p>\n<p>This leads into the next issue with TF-IDF.<span class=\"Apple-converted-space\">\u00a0 <\/span>Without context, it fails to account for words that mean the same thing.<span class=\"Apple-converted-space\">\u00a0 <\/span>Consider something like dog and puppy.<span class=\"Apple-converted-space\">\u00a0 <\/span>TF-IDF would treat them as different words instead of recognizing they are similar.<\/p>\n<h2>N-Grams<\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/N-gram\">N-Grams<\/a> were created to try to address the limitations of lack of word context.<span class=\"Apple-converted-space\">\u00a0 <\/span>They consider sequences of words to preserve word order versus looking solely at single words. \u00a0This consideration made N-Grams popular for modeling textual language and actual text generation.<\/p>\n<p>An N-Gram is defined as a <a href=\"https:\/\/www.sciencedirect.com\/topics\/computer-science\/contiguous-sequence#:~:text=A%20'contiguous%20sequence'%20refers%20to,are%20stored%20consecutively%20in%20memory.\">contiguous sequence<\/a> of N items that are in text.<span class=\"Apple-converted-space\">\u00a0 <\/span>Consider the example sentence of \u201cThe cat sleeps on the chair\u201d.<span class=\"Apple-converted-space\">\u00a0 <\/span>A unigram (1-Gram) would be a single word such as \u201cThe\u201d and \u201ccat\u201d.<span class=\"Apple-converted-space\">\u00a0 <\/span>A bigram (2-Gram) is a sequence of two words, such as \u201cThe cat\u201d and \u201ccat sleeps\u201d.<span class=\"Apple-converted-space\">\u00a0 <\/span>A trigram (3-Gram) is then a sequence of three words, such as \u201cThe cat sleeps\u201d or \u201ccat sleeps on\u201d.<span class=\"Apple-converted-space\">\u00a0 <\/span>This continues on for larger numbers of N and counts the frequency that these sequences occur in a text.<span class=\"Apple-converted-space\">\u00a0 <\/span>As such, the higher the value of N, the more context is captured from the text.<\/p>\n<p>You might immediately see a problem with N-Grams.<span class=\"Apple-converted-space\">\u00a0 <\/span>As the value of N increases, the number of possible word sequences goes up nearly exponentially.<span class=\"Apple-converted-space\">\u00a0 <\/span>For example, a vocabulary of <em>V<\/em> would have approximately <em>V<sup>N<\/sup><\/em> N-Grams.<span class=\"Apple-converted-space\">\u00a0 <\/span>This makes the model much harder to train when there is a limited amount of text.<span class=\"Apple-converted-space\">\u00a0 <\/span>Higher values of N also can cause a smaller number of matches across documents.<span class=\"Apple-converted-space\">\u00a0 <\/span>N-Grams still fail to capture dependencies across sentences as they only capture localized context.<\/p>\n<h2>Hidden Markov Models<\/h2>\n<p>Hidden Markov Models (HMMs) were used quite a bit (and periodically used today) for NLP tasks like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Part-of-speech_tagging\">part-of-speech tagging<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Named-entity_recognition\">named entity recognition<\/a> (NER).<span class=\"Apple-converted-space\">\u00a0 <\/span>They are probability models that represent the sequence of hidden states that exist in systems that are based on observable events.<\/p>\n<p>A HMM consists of several parts:<\/p>\n<ul>\n<li><strong>Hidden States<\/strong> represent the phenomena of a system that are unobservable.<span class=\"Apple-converted-space\">\u00a0 <\/span>In relation to NLP, these hidden states could be parts of speech.<\/li>\n<li><strong>Observations<\/strong> are the words or their tokenization from text that are observed directly.<\/li>\n<li><strong>Transition probabilities<\/strong> are the probability of moving from one hidden state to another.<span class=\"Apple-converted-space\">\u00a0 <\/span>In NLP, nouns are usually followed by verbs, so that the probability of going from a noun to a verb is very high.<\/li>\n<li><strong>Emission probabilities<\/strong> are the probabilities of observing a particular word given a particular hidden state.<span class=\"Apple-converted-space\">\u00a0 <\/span>For example, if the hidden state is a verb, then the probability of the word sleep given the verb state would be high.<\/li>\n<\/ul>\n<p>HMMs work by assuming that the system being modeled (not necessarily just for NLP work) is a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Markov_chain\">Markov Process<\/a>.<span class=\"Apple-converted-space\">\u00a0 <\/span>Put simply, a Markov Process means that the probability of each state depends only on the previous state.<span class=\"Apple-converted-space\">\u00a0 <\/span>With respect to NLP, the hidden states can be considered to represent parts of speech, such as nouns and verbs.<span class=\"Apple-converted-space\">\u00a0 <\/span>The observed events of the system would be the words themselves.<span class=\"Apple-converted-space\">\u00a0 <\/span>A HMM then uses the above probabilities to model the sequence of hidden states that most likely produced the observable text.<\/p>\n<p>HMMs require training data to be able to predict the parts of speech in a sentence.<span class=\"Apple-converted-space\">\u00a0 <\/span>This is also their drawback in that they require manually labeled training data to be able to estimate probabilities. <span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n<h2>Support Vector Machines<\/h2>\n<p>The last \u201cold school\u201d NLP technique we will discuss is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Support_vector_machine\">Support Vector Machines<\/a> (SVM).<span class=\"Apple-converted-space\">\u00a0 <\/span>They were one of the first machine learning algorithms used for classification tasks.<span class=\"Apple-converted-space\">\u00a0 <\/span>SVMs were useful for tasks such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sentiment_analysis\">sentiment analysis<\/a> and spam detection.<span class=\"Apple-converted-space\">\u00a0 <\/span>Much like machine learning tasks today, they worked by finding a <a href=\"https:\/\/deepai.org\/machine-learning-glossary-and-terms\/hyperplane\">hyperplane<\/a> that separates different classes in high-dimensional spaces (yes this is hard to wrap your mind around, I would suggest reading the links to learn more).<\/p>\n<p>Machine learning is based on vectors, or arrays of values.<span class=\"Apple-converted-space\">\u00a0 <\/span>SVMs are no exception.<span class=\"Apple-converted-space\">\u00a0 <\/span>They work by encoding the text document as a numerical vector such as TF-IDF values or others.<span class=\"Apple-converted-space\">\u00a0 <\/span>They then search to find an optimal plane \/ boundary that can completely separate multiple classes (for example, text that speaks positively about something versus negatively about it).<span class=\"Apple-converted-space\">\u00a0 <\/span><a href=\"https:\/\/towardsdatascience.com\/transform-data-to-high-dimensional-kernel-space-87d62b670e0f\">Kernel functions<\/a> were used in cases where the plane that separated the classes was non-linear.<span class=\"Apple-converted-space\">\u00a0 They would map the input space into another space that would allow the classes to be divided by a hyperplane. \u00a0<\/span>This early machine learning method was very useful for classification tasks.<\/p>\n<p>While they were good for classification tasks, they were not good at modeling sequences or structures in texts.<span class=\"Apple-converted-space\">\u00a0 <\/span>Outside of classification they did not really perform as well.<span class=\"Apple-converted-space\">\u00a0 <\/span>As is often the case with machine learning, large scale datasets were computationally expensive to train.<\/p>\n<h2>Conclusion<\/h2>\n<p>This basically sums up some of the classification methods for NLP.<span class=\"Apple-converted-space\">\u00a0 <\/span>You can see how things developed from statistical analysis to the beginnings of machine learning.<span class=\"Apple-converted-space\">\u00a0 <\/span>These methods were good at some things, but failed to capture the complexity and context of languages.<span class=\"Apple-converted-space\">\u00a0 <\/span>They heavily relied on preprocessing steps such as tokenization, common and stop word removals, manual labeling, and so on.<span class=\"Apple-converted-space\">\u00a0 <\/span>These steps could be time consuming and still not capture meaning.<\/p>\n<p>The move to more machine learning methods would finally enable better handling of ambiguity and context in various languages.<span class=\"Apple-converted-space\">\u00a0 <\/span>Next time we will close with modern techniques at NLP and how they have created a revolution in processing human languages. <span class=\"Apple-converted-space\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last time we learned why English is a hard language, both for humans and especially for computers.\u00a0 For this time I think we will look at the past of NLP to understand the present. It actually has been an interesting &hellip; <a href=\"https:\/\/brian.digitalmaddox.com\/blog\/?p=825\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":834,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-825","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/825","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=825"}],"version-history":[{"count":4,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/825\/revisions"}],"predecessor-version":[{"id":833,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/825\/revisions\/833"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=\/wp\/v2\/media\/834"}],"wp:attachment":[{"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=825"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=825"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brian.digitalmaddox.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=825"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}