Sqlite count words

9/26/2023

I thought of trying that in Tcl and then figured that SQLite could find the intersection much more efficiently. Then a search would look through 13,810 words to find the set of occurrences, and the intersection of the sets of all words in the search could be determined. One could, perhaps, build a table with one row per word and a column that contains some form of a set of all its occurrences (by version book_no chapter_no verse_no index_no? combination). There are only 13,810 case-sensitive unique words among those 800,000 rows in eB_concordance. Thank you for taking the time to read this question. Does this affect the "value" of the index? This can be handled differently and, in fact, wouldn't work completely in my little example code above because, although the query would return all the verses regardless of spelling, only the spelling searched upon would be wrapped in the span tags.

Suppose, in order to return all the verses in which either name occurs when any one of the three spellings is searched upon, the eB_concordance table is modified to include all the combinations for each spelling. For example, Pharaoh-nechoh, Pharaoh-necho, and Necho all refer to the same person but each has a set of unique combinations of book_no, chapter_no, verse_no, index_no within the same version number. Some names occur with slightly different spellings in the same copy of a version and across different copies of the same version. A user may ask for results for words that have only a 1 in the "woc" column which can be 0 or 1 only and will never ask for 0 only.ĭ. Does it matter, when considering an index on index_no, that once a book_no, chapter_no, and verse_no are determined, there are never more than 90 rows 97% of these combinations have 50 or less rows and 70% have 30 or fewer rows?Ĭ. Does this mean that an index on the five in combination will always be used?ī.

The version, book_no, chapter_no, verse_no, and word columns will be in every query but not always as a constraint. Some items of potential importance include:Ī. How is it determined which columns should have indexes and should there be indexes on combinations of columns? Once these tables are built, data will not be added to them again except to add a new version so, write speed is irrelevant. Is the CTE the correct approach and, if so, would it make any difference determining in advance which word has the least occurrences and starting the CTE with it? For example, a search could involve an article or conjunction both of which have many occurrences, such as "the faith" and, if performed in order, the "the" would start off returning over 50k rows when the word "faith" would return 247 rows.Ģ. Select e.book_no as b, e.chapter_no as c, e.verse_no as v,\ # Below is one example query run from Tcl for an exact match Select book_no, chapter_no, verse_no, count(*) as cntĬase when cnt > 80 then '81-90' - max is 90 5 to 10 once the other versions to be studied are added.įor purposes of the question regarding an index on index_no, the following query provides an idea of the number of index_no's per book_no-chapter_no-verse_no combination in eB_concordance. However both of these tables will grow by a factor of The tables and query are below.Ĭreate table if not exists eB_concordance I'm trying to understand how to determine which columns to index for two read-only tables.

0 Comments

Sqlite count words

Leave a Reply.

Author

Archives

Categories