take it one step further, you can entirely skip generating the structured data. With LLM's, you can directly query on unstructured data, and it'll only get better at doing this with time
The key aspect is "ability to understand". Depending on how the LLM is trained and the context, the structured data might be misleading. For example, I asked chatGPT to create a json structure based on these two statements: "Apple is good. Let's invest in it." It came up with the following:
{
"statements": [
{
"text": "Apple is good.",
"sentiment": "positive"
},
{
"text": "Let's invest in it.",
"sentiment": "positive"
}
]
}
Which is nice. But because my input lacked context I still need to provide additional metadata.
take it one step further, you can entirely skip generating the structured data. With LLM's, you can directly query on unstructured data, and it'll only get better at doing this with time
The key aspect is "ability to understand". Depending on how the LLM is trained and the context, the structured data might be misleading. For example, I asked chatGPT to create a json structure based on these two statements: "Apple is good. Let's invest in it." It came up with the following:
{
"statements": [
{
"text": "Apple is good.",
"sentiment": "positive"
},
{
"text": "Let's invest in it.",
"sentiment": "positive"
}
]
}
Which is nice. But because my input lacked context I still need to provide additional metadata.
This is what Thomson Reuters' Document Intelligence product does.