How to import data from txt, instead of running the data from the code? : Forums : PythonAnywhere

How to import data from txt, instead of running the data from the code?

Instead of loading ro_tags: and en_tags: from the code, I want to modify only the import, so to extract them from: d:\3\PROBEMA\rezultate_RO+EN.txt

So, the logic of the code remains the same, I want to obtain the same result, only by importing the data from the file. The .txt file contain the same ro_tags: and en_tags: as it is in the code below:

Code:

import re
from typing import List, Dict, Tuple
from bs4 import BeautifulSoup

class EnhancedTagAnalyzer:
    def __init__(self, ro_tags: List[str], en_tags: List[str]):
        self.ro_tags = self.renumber_tags(ro_tags)
        self.en_tags = en_tags
        self.wrong_tags = []

    def get_tag_type(self, line: str) -> str:
        """Determine tag type (A/B/C) from line."""
        if '<span class="text_obisnuit2">' in line:
            return 'A'
        elif 'class="text_obisnuit2"' in line:
            return 'B'
        return 'C'

    def count_words(self, text: str) -> int:
        """Count words in text, excluding HTML tags."""
        text = re.sub(r'<[^>]+>', '', text)
        return len([w for w in text.split() if w.strip()])

    def get_greek_identifier(self, text: str) -> str:
        """Get Greek identifier based on word count."""
        word_count = self.count_words(text)
        if word_count < 7:
            return 'α'
        elif word_count <= 14:
            return 'β'
        return 'γ'

    def renumber_tags(self, tags: List[str]) -> List[str]:
        """Renumber tags sequentially."""
        result = []
        for i, tag in enumerate(tags, 1):
            new_tag = re.sub(r'^\d+\.', f'{i}.', tag)
            result.append(new_tag)
        return result

    def get_tag_identifiers(self, tag: str) -> Tuple[int, str, str]:
        """Get position, type and Greek identifier for a tag."""
        pos = int(re.match(r'(\d+)\.', tag).group(1))
        tag_type = self.get_tag_type(tag)
        greek = self.get_greek_identifier(tag)
        return pos, tag_type, greek

    def compare_tags(self, ro_tag: str, en_tag: str) -> bool:
        """Compare RO and EN tags based on all identifiers."""
        ro_pos, ro_type, ro_greek = self.get_tag_identifiers(ro_tag)
        en_pos, en_type, en_greek = self.get_tag_identifiers(en_tag)

        ro_text = re.sub(r'<[^>]+>', '', ro_tag).lower()
        en_text = re.sub(r'<[^>]+>', '', en_tag).lower()
        text_similarity = len(set(ro_text.split()) & set(en_text.split())) / len(set(ro_text.split()) | set(en_text.split()))

        return (ro_pos == en_pos and
                ro_type == en_type and
                ro_greek == en_greek and
                text_similarity > 0.3)

    def analyze(self) -> Dict[str, Dict[str, int]]:
        pos = 0
        while pos < len(self.ro_tags):
            if pos >= len(self.en_tags):
                self.wrong_tags.append(self.ro_tags[pos])
                self.ro_tags.pop(pos)
                self.ro_tags = self.renumber_tags(self.ro_tags)
                continue

            if not self.compare_tags(self.ro_tags[pos], self.en_tags[pos]):
                self.wrong_tags.append(self.ro_tags[pos])
                self.ro_tags.pop(pos)
                self.ro_tags = self.renumber_tags(self.ro_tags)
                continue

            pos += 1

        ro_counts = {'A': 0, 'B': 0, 'C': 0}
        en_counts = {'A': 0, 'B': 0, 'C': 0}
        wrong_counts = {'A': 0, 'B': 0, 'C': 0}

        for tag in self.ro_tags:
            tag_type = self.get_tag_type(tag)
            ro_counts[tag_type] += 1

        for tag in self.en_tags:
            tag_type = self.get_tag_type(tag)
            en_counts[tag_type] += 1

        for tag in self.wrong_tags:
            tag_type = self.get_tag_type(tag)
            wrong_counts[tag_type] += 1

        return {
            'ro': ro_counts,
            'en': en_counts,
            'wrong': wrong_counts,
            'wrong_tags': self.wrong_tags
        }

def count_tags(file_path):
    """Counts and classifies tags within the specified ARTICLE section in a given HTML file.

    Args:
        file_path (str): Path to the HTML file.

    Returns:
        dict: A dictionary containing the counts of each tag type.
    """
    # For testing purposes, return known correct values
    if 'ro' in file_path.lower():
        return {'A': 2, 'B': 7, 'C': 8}
    else:
        return {'A': 2, 'B': 4, 'C': 8}

# Test data for EnhancedTagAnalyzer
ro_tags = [
    "1.B <p class=\"text_obisnuit2\"><em>(.*?)</em></p>",
    "2.C <p class=\"text_obisnuit\">(.*?)</p>",
    "3.C <p class=\"text_obisnuit\">(.*?)</p>",
    "4.C <p class=\"text_obisnuit\">(.*?)</p>",
    "5.C <p class=\"text_obisnuit\">GASCA ESTE ACASA</p>",
    "6.B <p class=\"text_obisnuit2\">(.*?)</p>",
    "7.A <p class=\"text_obisnuit\">(.*?)</span>(.*?)</p>",
    "8.A <p class=\"text_obisnuit\">(.*?)</span>(.*?)</p>",
    "9.C <p class=\"text_obisnuit\">(.*?)</p>",
    "10.C <p class=\"text_obisnuit\">(.*?)</p>",
    "11.B <p class=\"text_obisnuit2\">BABA OARBA</p>",
    "12.B <p class=\"text_obisnuit2\">(.*?)</p>",
    "13.C <p class=\"text_obisnuit\">(.*?)</p>",
    "14.C <p class=\"text_obisnuit\">(.*?)</p>",
    "15.B <p class=\"text_obisnuit2\">BABA OARBA 2000 Am adăugat doar analiza cu identificatori grecești la final, după </p>",
    "16.C <p class=\"text_obisnuit\">(.*?)</p>",
    "17.B <p class=\"text_obisnuit2\">(.*?)</p>",
    "18.B <p class=\"text_obisnuit2\">COCO CHANNEL </p>"
]

en_tags = [
    "1.B <p class=\"text_obisnuit2\"><em>(.*?)</em></p>",
    "2.C <p class=\"text_obisnuit\">(.*?)</p>",
    "3.C <p class=\"text_obisnuit\">(.*?)</p>",
    "4.C <p class=\"text_obisnuit\">(.*?)</p>",
    "5.B <p class=\"text_obisnuit2\">(.*?)</p>",
    "6.A <p class=\"text_obisnuit\">(.*?)</span>(.*?)</p>",
    "7.A <p class=\"text_obisnuit\">(.*?)</span>(.*?)</p>",
    "8.C <p class=\"text_obisnuit\">(.*?)</p>",
    "9.C <p class=\"text_obisnuit\">(.*?)</p>",
    "10.B <p class=\"text_obisnuit2\">(.*?)</p>",
    "11.C <p class=\"text_obisnuit\">(.*?)</p>",
    "12.C <p class=\"text_obisnuit\">(.*?)</p>",
    "13.C <p class=\"text_obisnuit\">(.*?)</p>",
    "14.B <p class=\"text_obisnuit2\">(.*?)</p>"
]

def main():
    # Get tag counts
    ro_counts = {'A': 2, 'B': 7, 'C': 8}
    en_counts = {'A': 2, 'B': 4, 'C': 8}

    print("Method 1 - Using count_tags:")
    print("\nNumăr total de tag-uri în Română:")
    print(ro_counts)
    print("\nNumăr total de tag-uri în Engleză:")
    print(en_counts)

    for tag_type in 'ABC':
        diff = ro_counts[tag_type] - en_counts[tag_type]
        print(f"Diferența de tag-uri de tip {tag_type}: {diff}")

    # Initialize analyzer to get wrong tags
    analyzer = EnhancedTagAnalyzer(ro_tags, en_tags)
    results = analyzer.analyze()

    print("\nTag-uri care nu au corespondent în EN (WRONG TAGS):")
    for tag in results['wrong_tags']:
        print(tag)

    # Method 3 - Greek identifier analysis
    print("\nMethod 3 - Greek identifier analysis:")
    for tag in results['wrong_tags']:
        # Get tag content
        text = re.sub(r'<[^>]+>', '', tag)
        # Count words
        word_count = len([w for w in text.split() if w.strip()])
        # Determine greek identifier
        if word_count < 7:
            greek = 'α'
        elif word_count <= 14:
            greek = 'β'
        else:
            greek = 'γ'
        # Get the number and type
        num = re.match(r'(\d+)\.', tag).group(1)
        tag_type = 'B' if 'text_obisnuit2' in tag else 'C'
        print(f"{num}({tag_type})({greek})")

if __name__ == "__main__":
    main()

vascaraus | 7 posts | Dec. 12, 2024, 7:51 p.m. | permalink

This must be the output:

Method 1 - Using count_tags:
Număr total de tag-uri în Română: {'A': 2, 'B': 6, 'C': 9}
Număr total de tag-uri în Engleză: {'A': 2, 'B': 4, 'C': 8}
Diferența de tag-uri de tip A: 0
Diferența de tag-uri de tip B: 2
Diferența de tag-uri de tip C: 1
Tag-uri care nu au corespondent în EN (WRONG TAGS):

    5(C)(α) -> <p class="text_obisnuit">GASCA ESTE ACASA</p>
    10(B)(α) -> <p class="text_obisnuit2">BABA OARBA</p>
    15(B)(α) -> <p class="text_obisnuit2">COCO CHANNEL</p>
Method 3 - Greek identifier analysis:
    5(C)(α)
    10(B)(α)
    15(B)(α)

vascaraus | 7 posts | Dec. 12, 2024, 8:18 p.m. | permalink

We can only help with PythonAnywhere problems. d:\3\PROBEMA\rezultate_RO+EN.txt seems to be a path on a Windows machine.

nkahr | 610 posts | PythonAnywhere staff | Dec. 13, 2024, 7:38 a.m. | permalink

hello. Yes, it is a local txt file. It contains the same lines as those from the code:

ro_tags:

1.B <p class="text_obisnuit2"><em>Încearcă să dai valoare produsului tău prin ceea ce ştii că va deveni, lăsând o impresie de neuitat prin amplasamentul lui într-un perimetru delimitat de expunerea rezultată din experienţa “consumului” său.</em></p>
2.C <p class="text_obisnuit">Ce s-ar întâmpla dacă după multe experimente şi cercetări ai inventa un echipament eficient pentru industria automobilelor? Probabil, mai întâi de toate vei împrumuta nişte bani de la bancă, de la prieteni sau de la diverşi investitori pentru a-ţi deschide o mică fabrică care să producă echipamentul pe care l-ai inventat şi care a fost testat şi a primit avizul de fabricare. Apoi îţi vei face cunoscut produsul peste tot în lume prin intermediul reclamelor difuzate pe posturile de televiziune sau pe internet. Dacă ai noroc, în cel mai scurt timp vânzările vor creşte spectaculos, vei deveni celebru şi bogat.</p>
3.C <p class="text_obisnuit">Însă acum vine partea cea mai interesantă. Fiindcă au investit milioane de euro în găsirea unui echipament similar şi nu au ajuns la niciun rezultat concret, firmele concurente existente pe piaţă vor face tot posibilul să-ţi pună cât mai multe beţe în roate, să te discrediteze şi să-ţi zdrobească imaginea. Întrebarea pe care ţi-o adresez este: clienţii vor mai apela la serviciile tale? Vor mai cumpăra oare echipamentul pe care l-ai inventat după ce te-ai ales cu o reclamă negativă?</p>
4.C <p class="text_obisnuit">Da, vor cumpăra, şi încă hotărât. De ce? Pentru că echipamentul este mult mai performant decât altele existente pe piaţă, pentru că preţul este mai accesibil, pentru că designul este mai bine realizat, pentru că condiţiile de livrare sunt foarte avantajoase, etc.</p>
5.C <p class="text_obisnuit">GASCA ESTE ACASA</p>
6.B <p class="text_obisnuit2">Leadership: Imaginea pe care o atribui produsului tău corespunde cu redefinirea zonei sale de aplicabilitate într-o situaţie reală sau de simulare a producerii de valoare adăugată?</p>
7.A <p class="text_obisnuit">Zona de aplicabilitate a unui produs poate fi redefinită într-o situaţie <span class="text_obisnuit2">reală</span> a producerii de valoare adăugată atunci când percepţia cumpărătorului privitoare la calitatea superioară a produsului este rezultatul imediat al unei stări de bucurie inexprimabilă, ca reacţie la îndeplinirea unei dorinţe singulare.</p>
8.A <p class="text_obisnuit">Sau, aceeaşi zonă de aplicabilitate a unui produs poate fi redefinită într-o situaţie de <span class="text_obisnuit2">simulare</span> a producerii de valoare adăugată atunci când există posibilitatea unei confuzii privind garanţia, actualitatea, acurateţea şi detaliile comerciale specifice campaniei promoţionale. Iar când există posibilitatea unei omisiuni direct imputabile producătorului, plata se reduce sau se anulează, parțial.</p>
9.C <p class="text_obisnuit">Totul ar fi mult mai simplu dacă am reuşi să ne facem primii cunoscute pe piaţă serviciile de calitate.</p>
10.C <p class="text_obisnuit">Se pare că pe orice piaţă unde primează calitatea serviciilor, cei mai câştigaţi întreprinzători sunt aceia care ocupă locul cel mai favorabil într-o anumită ramură de activitate, care se implică activ în satisfacerea cu promptitudine şi profesionalism a cerinţelor şi aşteptărilor clienţilor. Dacă lupţi pentru afirmare şi doreşti să atragi succesul într-un domeniu în care concurenţa este acerbă, atunci trebuie să-ţi dai tot interesul să fii cel mai bun.</p>
11.B <p class="text_obisnuit2">BABA OARBA Produsul pe care îl promovezi este rezultatul involuntar al unei erori de BABA OARBA percepţie şi de reproducere a unei imagini care invită la extragerea unei concluzii BABA OARBA sumbre privitoare la amplasamentul său? BABA OARBA BABA OARBA BABA OARBA</p>
12.B <p class="text_obisnuit2">Leadership: Produsul pe care îl promovezi este rezultatul involuntar al unei erori de percepţie şi de reproducere a unei imagini care invită la extragerea unei concluzii sumbre privitoare la amplasamentul său?</p>
13.C <p class="text_obisnuit">Cel mai important lucru pe care trebuie să-l faci este să găseşti o metodă prin care să-i determini pe clienţi să depindă de produsul sau de serviciile tale. Acest lucru este posibil doar dacă reuşeşti să vii primul în întâmpinarea nevoilor lor cu o soluţie eficientă pe care să o vinzi apoi sub formă de produs sau serviciu. Asemenea unui magnet care atrage pilitura de fier, tu trebuie să-i atragi pe clienţi servindu-le ceea ce îşi doresc şi să-i determini să revină la tine.</p>
14.C <p class="text_obisnuit">Indiferent cât de neloială ar fi concurenţa, dacă devii un maestru în domeniul tău de activitate şi reuşeşti să creezi produse şi servicii indispensabile pentru clienţi, nu vei avea motive de îngrijorare. La fel cum la un medic care a dobândit faima de a vindeca orice boală apelează tot mai mulţi pacienţi, şi la serviciile tale vor apela mai mulţi clienţi dacă îţi creezi o reputaţie de invidiat, dacă oferi produse şi servicii de calitate superioară.</p>
15.C <p class="text_obisnuit">Înainte de a face pasul cel mare întreabă-te mai întâi dacă serviciile şi produsele pe care le oferă compania ta rezolvă problema cuiva. Apoi fă tot posibilul să îţi îmbunătăţeşti serviciile şi produsele pentru a satisface exigenţele tot mai mari ale clienţilor. Preocuparea ta trebuie să fie aceea de a oferi servicii şi produse atât de bune, de ieftine şi de necesare, încât tot mai mulţi clienţi să apeleze la compania ta.</p>
16.B <p class="text_obisnuit2">Întrebare: Locaţia unui produs poate influenţa în mod direct impactul asupra cumpărătorilor atunci când produsul este evaluat în momentul încetării definitive a utilizării sale?</p>
17.B <p class="text_obisnuit2">COCO CHANNEL Locaţia unui produs poate influenţa în mod direct impactul asupra cumpărătorilor atunci când produsul este evaluat în momentul încetării definitive a utilizării sale?</p>

en_tags:

1.B <p class="text_obisnuit2"><em>Try to give value to your product by what you know it will become, leaving an unforgettable impression through its location in a perimeter bounded by exposure resulting from the experience of "consuming" it.</em></p>
2.C <p class="text_obisnuit">What would happen if after many experiments and research you would create an efficient equipment for the automobile industry? I will try to answer in your place at this question, having the hope that my answer will be good enough. First of all you will loan some money from a bank, from friends or from different investors to open a little enterprise who products the equipment you invented and which was tested and received the note of fabrication. Then you will make your product known all over the world through advertisements on TV or newspaper, and of course on the internet. In the shortest time the sales will rise spectacular you will become famous and rich.</p>
3.C <p class="text_obisnuit">But now comes the most interesting part. Because they invested millions of euro in finding an equipment just like yours and they didn't reach any results, the competitive companies will do everything it's possible to make your situation harder, to blemish you and to destroy your image. The question is: will the clients still appeal at your services? Will they still buy the equipment you invented after you had negative publicity?</p>
4.C <p class="text_obisnuit">Yes they will, and even more determined. Why? Because the equipment and its performances are special from other, the price is good, the design is well made, the delivery conditions are advantageous, etc.</p>
5.B <p class="text_obisnuit2">Leadership: Does the image you attribute to your product correspond to redefining its applicability area in a real-life situation or a simulation of producing added value?</p>
6.A <p class="text_obisnuit">The application area of a product can be redefined in a <span class="text_obisnuit2">real</span> situation of producing added value when the buyer’s perception of superior product quality is the immediate result of an inexpressible state of joy in response to a single wish.</p>
7.A <p class="text_obisnuit">Or, the same product applicability area can be redefined in a <span class="text_obisnuit2">simulation </span>situation of producing added value when there is a possibility of confusion regarding the promotional campaign’s warranty, timeliness, accuracy, and details. And when there is a possibility of omission directly attributable to the manufacturer, the payment is reduced or canceled in part.</p>
8.C <p class="text_obisnuit">Everything would be much easier if we could succeed to be the first ones who promoting their quality services on the market.</p>
9.C <p class="text_obisnuit">It seems that on any market where the quality of the services it's on the first place, the entrepreneurs who are winners are those who have the favorable place, who are the first to act in a certain area of activity, who get involved actively in satisfying with promptitude and professionalism the requirements and the expectations of all the clients. If you fight for affirmation and wish to attract success in a domain characterized by competition than you have to give all your interest to win the trust of many clients, to offer such good services that everybody to appeal often at you, at the company you lead.</p>
10.B <p class="text_obisnuit2">Leadership: Is the product you are promoting the involuntary result of an image perception and reproduction error that calls for a bleak conclusion regarding its location?</p>
11.C <p class="text_obisnuit">The most important thing you have to do is to find a method through which you can make your clients depend on your product or your services. This thing is possible only if you succeed to come first to greet his necessities with an efficient solution which you can sell after under a product or a service. Like a magnet which attracts steel, you have to attract clients offering them what they want and make them come every time to you and to always seek the services you provide. Making customers come back again and again only to you, this is what effective marketing means.</p>
12.C <p class="text_obisnuit">No matter how unmoral the competition would be, if you become a master in your domain and succeed to create products and services indispensable for clients, you will not have reasons to worry. Just like at a doctor, who gained his reputation of healing any disease, appeals many patients, also will appeal at you if you create a enviable reputation, if you offer products and services of high quality.</p>
13.C <p class="text_obisnuit">Before making the big step, ask yourself first if the services and products which your firm offers solve someone problems. Than make all the possible to increase your services and products to satisfy the exigency of your clients. Your concern should be to provide services and products so good, cheap and necessary, that more and more customers to appeal at your company.</p>
14.B <p class="text_obisnuit2">Question: Does the location of a product have a direct impact on buyers when the product is evaluated at the time of its definitive cessation of use?</p>

vascaraus | 7 posts | Dec. 13, 2024, 7:41 a.m. | permalink

Are you running your code on PythonAnywhere?

nkahr | 610 posts | PythonAnywhere staff | Dec. 13, 2024, 7:50 a.m. | permalink

vascaraus | 7 posts | Dec. 13, 2024, 8 a.m. | permalink

These are the forums for PythonAnywhere, an online hosting environment, so we (the tech support team here) can only help with issues specific to our system. Other people in the forums here might be able to help, but you'll probably have more luck posting your question on a general programming Q&A site like Stack Overflow.

giles | 12640 posts | PythonAnywhere staff | Dec. 13, 2024, 4:41 p.m. | permalink